Final  Report  for  AOARD  Grant  104060 

“Generalized  Entropies  and  Legendre  Duality” 
22/04/2012 

Name  of  Principal  Investigators:  Keiko  Uohashi 

e-mail  address  :  uohashi@tjcc.tohoku-gakuin.ac.jp 

Institution  :  Department  of  Mechanical  Engineering  &  Intelligent  Systems, 
Faculty  of  Engineering,  Tohoku  Gakuin  University 
Mailing  Address  :  1-13-1  Chuo,  Tagajo,  Miyagi  985-8537,  Japan 

-  Phone :  +81-22-368-7284 

-  Fax :  +81-22-368-7070 

Period  of  Performance:  22/04/2010  -  22/04/2011 


Abstract:  Making  use  of  conformally  flattened  structure  of  alpha-geometry,  we  have  shown  that  the 
simple  and  computationally  efficient  algorithm  can  be  derived  to  construct  the  alpha-Voronoi 
diagrams  on  the  space  of  discrete  probability  distributions.  Geometry  for  (/-exponential  families, 
which  is  related  with  alpha-geometry,  and  its  statistical  applications  are  also  studied.  In  addition  we 
have  studied  conformal  flatness  of  level  surfaces  in  Hessian  domains.  Especially  we  have  also  studied 
harmonic  maps  between  level  surfaces  of  Hessian  domains,  relating  with  conformally  flat  structure. 

Introduction:  Along  the  line  of  geometric  study  of  generalized  entropies  and  Legendre  structures, 
we  have  elucidated  a  relation  between  the  alpha-geometry  and  the  escort  probability,  which  is  an 
important  tool  in  the  arguments  of  Tsallis’s  generalized  entropy,  in  the  following  paper:  A.  Ohara,  H. 
Matsuzoe  and  S.  Amari,  A  dually  flat  structure  on  the  space  of  escort  distributions,  2010  J.  Phys.: 
Conf.  Ser.  201  012012  (http://iopscience.iop.Org/1742-6596/201/l/012012).  There  we  have  observed 
that  conformally  flattening  of  the  alpha-geometry  introduces  the  escort  probabilities  as  affine 
coordinates  in  the  resultant  dually  flat  geometry  on  the  space  of  probability  distributions.  While  this 
result  is  still  purely  mathematical  and  the  implications  from  viewpoints  of  statistical  physics  are 
necessary,  we  have  found  an  interesting  application  to  information  science. 

A  (/-exponential  family  is  a  set  of  probability  distributions,  which  is  a  natural  generalization 
of  the  standard  exponential  family,  and  is  related  to  many  physical  phenomena  called  “complex 
systems”  that  obey  power-laws.  A  (/-exponential  family  has  geometric  structure  of  constant  curvature 
and  a  dually  flat  structure  simultaneously.  To  describe  these  relations,  we  introduce  a  conformal 
transformation  on  statistical  manifolds  and  have  successfully  clarified  them  in  addition  to  obtaining 
several  important  properties.  As  applications  of  geometry  for  (/-exponential  families,  a  geometric 
generalization  of  statistical  inference  are  also  proposed  and  studied. 

We  have  also  studied  Hessian  domains,  which  are  flat  statistical  manifolds  typically.  It  is 
known  that  level  surfaces  of  a  Hessian  domain  are  1 -conformally  flat  statistical  submanifolds.  We 
showed  conditions  that  1-conformally  flat  statistical  leaves  of  a  foliation  can  be  realized  as  level 
surfaces  of  their  common  Hessian  domain  conversely.  In  addition  we  study  harmonic  maps  between 
level  surfaces  of  a  Hessian  domain  with  1-,  (-1)-,  and,  in  general,  alpha-conformally  flat  connections, 
respectively.  Harmonic  maps  are  generalization  of  critical  points  of  a  function,  and  have  been 
researched  in  terms  of  geometry,  physics,  and  so  on.  For  example  H.  Shima  gave  conditions  for 
harmonicity  of  gradient  mappings  of  level  surfaces  on  a  Hessian  domain.  However  they  investigated 
harmonic  maps  on  level  surfaces  into  a  dual  affine  space,  not  into  other  level  surfaces.  K.  Nomizu  and 
T.  Sasaki  calculated  the  Laplacian  of  centro-affine  immersions  into  an  affine  space,  but  we  can  see  no 
discription  of  harmonic  maps  between  two  centro-affine  hypersurfaces.  Then  we  started  investigation 
of  harmonic  maps  between  two  level  surfaces. 


Experiment:  Nothing 
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Results  and  Discussion:  We  demonstrate  that  escort  probabilities  with  the  new  dually  flat  structure 
admits  a  simple  algorithm  to  compute  Voronoi  diagrams  and  centroids  with  respect  to 
alpha-divergences,  which  are  one -parameter  distance-like  functions  representing  discrepancy  between 
two  probability  distributions.  The  Voronoi  diagrams  on  the  space  of  probability  distributions  with  the 
Kullback-Leibler,  or  Bregman  divergences  have  been  recognized  as  important  tools  for  various 
statistical  modeling  problems  involving  pattern  classification,  clustering,  likelihood  ratio  test  and  so 
on  [2], 

The  largest  advantage  to  take  account  of  alpha-divergences  is  their  invariance  under 
transformations  by  sufficient  statistics  studied  by  Cencov,  which  is  a  significant  requirement  for  those 
statistical  applications.  In  computational  aspect,  the  conformal  flattening  of  the  alpha-geometry 
enables  us  to  invoke  the  standard  algorithm  by  Edelsbruner  using  a  potential  function  and  an  upper 
envelop  of  hyperplanes  with  the  escort  probabilities  as  coordinates  [6]. 

We  elaborate  the  relations  of  two  structures  on  (/-exponential  family:  geometric  structure  of 
constant  curvature  is  naturally  translated  to  dually  flat  structure  by  conformal  transformation.  This 
relation  provides  us  several  important  geometric  properties.  One  of  such  examples  is  a  fact  that  the 
(/-Pythagorean  theorem  holds  among  probability  distributions  in  this  family  [1].  As  a  simple 
application  of  the  theorem,  we  show  that  the  (/-version  of  the  maximum  entropy  theorem  is  naturally 
induced. 

We  have  also  applied  obtained  mathematical  results  to  extension  of  statistical  inference 
technique.  First  we  show  that  the  maximizer  of  the  (/-escort  distribution  is  a  Bayesian  MAP 
(Maximum  A  posteriori  Probability)  estimator  [1].  Second,  we  propose  maximum  q-likelihood 
estimation  and  geometrically  characterize  the  solution  [3]. 

On  conformal  flatness  of  level  surfaces  in  Hessian  domains,  we  obtain  the  following  result 
[4].  In  previous  paper  we  show  that  a  1 -conformally  flat  statistical  manifold  can  be  locally  realized  as 
a  submanifold  of  a  flat  statistical  manifold,  constructing  a  level  surface  of  a  Hessian  domain  (Uohashi, 
Ohara,  Fujii;  2000).  However  we  proved  realization  of  only  "a"  1-conformally  flat  statistical  manifold. 
In  this  study  we  give  conditions  for  realization  of  1-conformally  flat  statistical  manifolds  as  level 
surfaces  of  their  common  Hessian  domain.  If  embedding  a  1-conformally  flat  statistical  model  into  a 
higher  dimensional  model,  we  may  be  able  to  use  our  result. 

To  construct  harmonic  maps,  we  made  mappings  from  a  level  surface  to  another  level 
surface  on  a  Hessian  domain  by  conformal  transformation  [5].  Next  we  defined  alpha- structure  on 
level  surfaces  and  calculated  “variations  of  mappings'”  for  each  alpha-parameters.  A  harmonic  map 
makes  the  variation  of  the  mapping  zero.  So  we  show  a  condition  for  the  zero  variation  by  an  equation 
with  n  and  a  parameter  “alpha”,  where  n  is  dimension  of  level  surfaces.  It  is  a  problem  to  find 
relations  with  these  harmonic  maps  and  phenomena  on  statistics,  physics,  and  so  on. 
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Abstract:  The  Gibbs  distribution  of  statistical  physics  is  an  exponential  family  of  probability 
distributions,  which  has  a  mathematical  basis  of  duality  in  the  form  of  the  Legendre 
transformation.  Recent  studies  of  complex  systems  have  found  lots  of  distributions  obeying 
the  power  law  rather  than  the  standard  Gibbs  type  distributions.  The  Tsallis  g-entropy 
is  a  typical  example  capturing  such  phenomena.  We  treat  the  g-Gibbs  distribution  or 
the  g-exponential  family  by  generalizing  the  exponential  function  to  the  g-family  of 
power  functions,  which  is  useful  for  studying  various  complex  or  non-standard  physical 
phenomena.  We  give  a  new  mathematical  structure  to  the  g-exponential  family  different  from 
those  previously  given.  It  has  a  dually  flat  geometrical  structure  derived  from  the  Legendre 
transformation  and  the  conformal  geometry  is  useful  for  understanding  it.  The  g-version  of 
the  maximum  entropy  theorem  is  naturally  induced  from  the  g-Pythagorean  theorem.  We 
also  show  that  the  maximizer  of  the  g-escort  distribution  is  a  Bayesian  MAP  (Maximum  A 
posteriori  Probability)  estimator. 

Keywords:  g-exponential  family;  g-entropy;  information  geometry;  g-Pythagorean  theorem; 
q-Max-Ent  theorem;  conformal  transformation 
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1.  Introduction 

Statistical  physics  is  founded  on  the  Gibbs  distribution  for  microstates,  which  forms  an  exponential 
family  of  probability  distributions  known  in  statistics.  Important  macro-quantities  such  as  energy, 
entropy,  free  energy,  etc.  are  connected  with  it.  However,  recent  studies  show  that  there  are  non-standard 
complex  systems  which  are  subject  to  the  power  law  instead  of  the  exponential  law  of  the  Gibbs  type 
distributions.  See  [1,2]  as  well  as  extensive  literatures  cited  in  them. 

Tsallis  [3]  defined  the  g-entropy  to  elucidate  various  physical  phenomena  of  this  type,  followed 
by  many  related  research  works  on  this  subject  (see,  [1]).  The  concept  of  the  g-Gibbs  distribution 
or  g-exponential  family  of  probability  distributions  is  naturally  induced  from  this  framework  (see 
also  [4]).  However,  its  mathematical  structure  has  not  yet  been  explored  enough  [2,5,6],  while  the 
Gibbs  type  distribution  has  been  studied  well  as  the  exponential  family  of  distributions  [7].  We  need  a 
mathematical  (geometrical)  foundation  to  study  the  properties  of  the  g-exponential  family.  This  paper 
presents  a  geometrical  foundation  for  the  g-exponential  family  based  on  information  geometry  [8],  giving 
geometrical  definitions  of  the  g-potential  function,  g-entropy  and  g-divergence  in  a  unified  way. 

We  define  the  g-geometrical  structure  consisting  of  a  Riemannian  metric  and  a  pair  of  dual  affine 
connections.  By  using  this  framework,  we  prove  that  a  family  of  g-exponential  distributions  is  dually 
flat,  in  which  the  g-Pythagorean  theorem  holds.  This  naturally  induces  the  corresponding  g-maximum 
entropy  theorem  similarly  to  the  case  of  the  Tsallis  g-entropy  [1,9,10].  The  g-structure  is  ubiquitous 
since  the  family  Sn  of  all  discrete  probability  distributions  can  always  be  endowed  with  the  structure 
of  the  g-exponential  family  for  arbitrary  g.  It  is  possible  to  generalize  the  g-structure  to  any  family 
of  probability  distributions.  Further,  it  has  a  close  relation  with  the  o-gcomctry  [8],  which  is  one 
of  information  geometric  structure  of  constant  curvature.  This  new  dually  flat  structure,  different 
from  the  old  one  given  rise  to  from  the  invariancy  in  information  geometry,  can  be  also  obtained 
by  conformal  flattening  of  the  o-gcomctry  [11,12],  using  a  technique  in  the  conformal  and  projective 
geometry  [13-15]. 

The  present  framework  prepares  mathematical  tools  for  analyzing  physical  phenomena  subject  to  the 
power  law.  The  Legendre  transformation  again  plays  a  fundamental  role  for  deriving  the  geometrical 
dual  structure.  There  exist  lots  of  applications  of  g-geometry  to  information  theory  ([16]  and  others)  and 
statistics,  including  Bayes  g-statistics. 

It  is  possible  to  generalize  our  framework  to  a  more  general  non-linear  family  of  distributions  by 
using  a  positive  convex  function  instead  of  g-exponential  function  (See  [2,17]).  A  good  example  is  the 
^-exponential  family  [18-20],  but  we  do  not  state  it  here. 

2.  g-Gibbs  or  g-Exponential  Family  of  Distributions 

2.1.  q- Logarithm  and  q-Exponential  Function 

It  is  the  first  step  to  generalize  the  logarithm  and  exponential  functions  to  include  a  family  of  power 
functions,  where  the  logarithm  and  exponential  functions  are  included  as  the  limiting  case  [1,5,21].  This 
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was  also  used  for  defining  the  a-family  of  distributions  in  information  geometry  [8].  We  define  the 
("/-logarithm  by 

l°gg(u)  =  i  “  (W1'9  -  l)  ,  U  >  0  (1) 

H  1  —  q 

and  its  inverse  function,  the  (/-exponential,  by 

exp9("u)  =  {1  +  (1  -  q)u}^  ,  u  >  —1/ (1  —  q)  (2) 

for  a  positive  q  with  q  ^  1.  The  limiting  case  q  — >  1  reduces  to 

log!  (u)  =  log  u  (3) 

exp1(w)  =  expw  (4) 

so  that  log9  and  expg  are  defined  for  q  >  0. 

2.2.  q-Exponential  Family 


The  standard  form  of  an  exponential  family  of  distributions  is  written  as 

p(x,0)  =exp|^0txi-'0(0)j  (5) 

with  respect  to  an  adequate  measure  p(x),  where  x  =  (x i .  •  •  • .  xn)  is  a  set  of  random  variables  and 
0  =  (91,- ■ ■ ,  9n)  are  the  canonical  parameters  to  describe  the  underlying  system.  The  Gibbs  distribution 
is  of  this  type.  Here,  -0(0)  is  called  the  free  energy,  which  is  the  cumulant  generating  function. 

The  power  version  of  the  Gibbs  distribution  is  written  as 


p(x,0)  =  exp  q{0-x-il>q(d)}  (6) 

log  q{p(x,G)}  =  0-x-ij}q(0)  (7) 

where  9  ■  x  =  This  is  the  (/-Gibbs  distribution  or  (/-exponential  family  [4],  which  we  denote  by 

S,  where  the  domain  of  x  is  restricted  such  that  p(x,  6)  >  0  holds.  The  function  called  the  g-free 

energy  or  (/-potential  function,  is  determined  from  the  normalization  condition: 


j  expg  {6  ■  x  —  dx  =  1 


(8) 


where  we  replaced  dp(x)  by  dx  for  brevity’s  sake.  The  function  w(j  depends  on  q,  but  we  hereafter 
neglect  suffix  q  in  most  cases.  Research  on  the  (/-exponential  family  can  be  found,  for  example,  in 
[2,4,19].  The  g-Gaussian  distribution  is  given  by 


p(x,  /i,  a)  =  expg 


(s  ~  9? 

2  (T2 


(9) 


and  is  studied  in  [22-25]  in  detail.  Here,  we  need  to  introduce  a  vector  random  variable  x  =  (x,  x2)  and 
a  new  parameter  9,  which  is  a  vector- valued  function  of  p  and  o,  to  represent  it  in  the  standard  form  (7). 
It  is  an  interesting  observation  that  the  domain  of  x  in  the  g-Gaussian  case  depends  on  g  if  0  <  g  <  1. 
Hence,  that  g-  and  g'-Gaussian  are  in  general  not  absolutely  continuous  when  q  ^  <[ . 
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It  should  be  remarked  that  the  (/-exponential  family  itself  is  the  same  as  the  ((-family  of  distributions 
in  information  geometry  [8].  Here,  we  introduce  a  different  geometrical  structure,  generalizing  the  result 
of  [24]. 

We  mainly  use  the  family  Sn  of  discrete  distributions  over  (n  +  1)  elements  X  =  {x(h  x1,  ■  ■  ■ ,  xn}, 
although  we  can  easily  extend  the  results  to  the  case  of  continuous  random  variables.  Here,  random 
variable  x  takes  values  over  X.  We  also  treat  the  case  of  ()<(/<  1,  and  the  limiting  cases  of  q  =  0  or  1 
give  the  well-known  ones. 

Let  us  put  pi  =  Prob  {x  =  xt }  and  denote  the  probability  distribution  by  vector  p  =  (p0,  pi,  •  •  • ,  pn), 
where 

n 

J2Pi  =  1  (10) 

i= 1 

The  probability  of  x  is  also  written  as 


where 


p(x)  = 

i=0 


Si(x) 


1,  X  =  Xi, 

0,  otherwise. 


(ID 


(12) 


Theorem  1  The  family  Sn  of  discrete  probability  distributions  has  the  structure  of  a  (/-exponential 
family  for  any  q. 


Proof  We  take  log9  of  distribution  p(x)  of  (11).  For  any  function  f(u),  we  have 


{n  n 

^pA{x)  \  =  ^2  f  ( Pi )  Si(x) 
i= 1  J  i=0 

By  taking 

n 

s0(x)  =  i  -  y^sfx) 

i= 1 

into  account,  discrete  distribution  (11)  can  be  rewritten  in  the  form  (8)  as 


(13) 


(14) 


log qp(x)  =  W  q  ~Po  Q )  5i(x)+Po  q  ~  (!5) 

where 

n 

po  =  i  -  Pi  (i6) 

i= 1 

is  treated  as  a  function  of  (pi,  •  •  • ,  p„).  Hence,  Sn  is  (/-exponential  family  (6)  for  any  q,  with  the  following 
(/-canonical  parameters,  random  variables  and  (/-potential  function: 

0l  =  y~  (pli~q  ~  Po~q )  ,  i  =  (17) 

Xi  =  Sfx )  (18) 

^(0)  =  -loggpo  (19) 
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This  completes  the  proof.  □ 

Note  that  the  ("/-potential  'tp(O)  and  the  canonical  parameter  6  depend  on  q  as  is  seen  in  (17)  and  (19). 
It  should  also  be  remarked  that  Theorem  1  does  not  contradict  to  the  theorem  1  in  [19]  stating  that  a 
parametrized  family  of  probability  distributions  can  belong  to  at  most  one  (/-exponential  family.  The 
author  considers  an  m-dimensional  parametrized  submanifold  in  Sn  with  m  <  n  where  the  canonical 
parameter  depending  on  q  is  given  via  the  variational  principle.  Therefore,  by  denoting  the  (/-canonical 
parameter  by  0q  e  Rm,  we  can  restate  his  theorem  in  terms  of  geometry  that  a  linear  submanifold 
parametrized  by  6q  e  Rm  is  not  a  linear  submanifold  parametrized  by  6q>  e  Rm  when  q’  q.  On  the 
other  hand,  the  present  theorem  states  that  there  exists  the  (/-canonical  parameter  6q  e  Rn  on  whole  Sn 
for  any  q  and  the  manifold  has  linear  structure  with  respect  to  any  6q.  This  is  a  surprising  new  finding. 

2.3.  q-Potential  Function 

We  study  the  (/-geometrical  structure  of  S.  The  (/-log-likelihood  is  a  linear  form  defined  by 


n 


(20) 


2—1 


By  differentiating  it  with  respect  to  6\  with  the  abbreviated  notation  <9,  =  we  have 


dilq(x,  6)  =  Xi-di'ip(d) 
didjlq(x,9)  =  —did ^{6) 


(21) 

(22) 


From  this  we  have  the  following  important  theorem. 


Theorem  2  The  (/-free  energy  or  (/-potential  'iPq(G)  is  a  convex  function  of  6q. 


Proof  We  omit  the  suffix  q  for  simplicity’s  sake.  We  have 


dip(x,  6)  =  p(x,  0)q  {x%  -  dp})) 

didjp(x,  6)  =  qp(x,  G)2^1  (xt  -  d^f)  (xj  -  djijj)  -  p(x,  Oydidjip 


(23) 

(24) 


The  following  identities  hold: 


(25) 


(26) 


Here,  we  define  an  important  functional 


(27) 


in  particular  for  discrete  Sn, 


n 


Kip)  =  ^Pqi 


i= 0 


(28) 
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for  0  <  q  <  1.  This  function  plays  a  key  role  in  the  following.  From  (25)  and  (26),  by  using  (23)  and 
(24),  we  have 


di'ift{0)  = 

hq(0)  J 

f  Xip(x,9)qdx 

(29) 

didji/j(0)  = 

Q 

hq(0)  J 

(  (. Xi  —  dpp)  (. Xj  —  djf>)  p(x,  6)2q~1dx 

(30) 

The  latter  shows  that  did is  positive-definite,  and  hence  w  is  convex.  □ 

2.4.  q-Divergence 

A  convex  function  ip(0)  makes  it  possible  to  define  a  divergence  of  the  Bregman-type  between  two 
probability  distributions  p  ( x ,  Of)  and  p  ( x ,  02)  [8,26,27].  It  is  given  by  using  the  gradient  V  =  d/d6, 

Dq  [p(a:,0i)  :p{x,02)}  = 

(02)  -  (0i)  -  (0i)  •  (02  -  0i)  (31) 

satisfying  the  non-negativity  condition 

Dq  [p  (x,  0i)  :  p  (x,  G2)\  >  0  (32) 

with  equality  when  and  only  when  0\  =  0>.  This  gives  a  ^-divergence  in  Sn  different  from  the 
invariant  divergence  of  Sn  [28].  The  divergence  is  canonical  in  the  sense  that  it  is  uniquely  determined 
in  accordance  with  dually  flat  structure  of  (/-exponential  family  in  Sections  3  and  4.  The  canonical 
divergence  is  different  from  the  a-divcrgcnce  or  conventional  Tsallis  relative  entropy  used  in  information 
geometry  (See  the  discussion  in  the  end  of  this  subsection).  Note  that  it  is  used  in  [16]. 


Theorem  3  For  two  discrete  distributions  p{x)  =  p  and  r(x)  =  r,  the  (/-divergence  is  given  by 

1 


Dq[p  :  r]  = 


l  _  ^q^1~q 


,Piri 


i= 0 


(1  -  q)hq(p) 

Proof  The  potentials  are,  from  (19), 

4>(p)  =  ~  loggPo,  i>(r)  =  -  logg  r0 

for  p  and  r.  We  need  to  calculate  V'diO)  given  in  (29).  In  our  case,  x,  =  Sfx)  and  hence 


dif  = 


Pi 


hq{p) 


By  using  this  and  (17),  we  obtain  (33).  □ 


It  is  useful  to  consider  a  related  probability  distribution, 

,  ,  ,  1 


Pfx)  = 


hq\p(x)\ 


P{x) 


(33) 


(34) 


(35) 


(36) 
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for  defining  the  ^-expectation.  This  is  called  the  g-escort  probability  distribution  [1,4,29].  Introducing 
the  g-expectation  of  random  variable  f(x)  by 

E-IJ(X)]  =  hMx) ]  / P(x),/(x)dx  (37) 

we  can  rewrite  the  g-divergence  (31)  for  p(x),  r(x)  E  S  as 

Dq  \p(x)  ■  r(*)]  =  Ep  [log qp{x)  -  log,  r(*)]  (38) 


because  of  the  relations  (20)  and  (29).  The  expression  (38)  is  also  valid  on  the  exterior  of  S  x  S  when  it 
is  integrable.  This  is  different  from  the  definition  of  the  Tsallis  relative  entropy  [30,31] 


Dq[p(x)  :  r(x)\ 


p( 


xrr(x 


,1— 


qdx 


(39) 


which  is  equal  to  the  well-known  ct-divergence  up  to  a  constant  factor  where  a  =  1  —  2g  (see  [8,28]), 
satisfying  the  invariance  criterion.  We  have 

Dq[p(x)  :  r(x)\  =  ^  1  Dq]p{x)  ■  r{x)}  (40) 

This  is  a  conformal  transformation  of  divergence,  as  we  see  in  the  following.  See  also  the  derivation 
based  on  affine  differential  geometry  [12]. 


2.5.  q-Riemannian  Metric 


When  02  is  infinitesimally  close  to  G i,  by  putting  6 1  =  6,  02  =  9+d6  and  using  the  Taylor  expansion, 
we  have 

Dq  \p(x,0)  :  p(x,  6  +  d0)\  =  ^gl{d)dff  <i6]  (41) 

where 


9if  =  didjipiG) 


(42) 


is  a  positive-definite  matrix.  We  call 


gif  (9) 


the  g-Fisher  information  matrix.  When  g  =  1,  this  reduces 


to  the  ordinary  Fisher  information  matrix  given  by 


dlfiO)  =  g?j{0)  =  E  [dl  \ogp(x,  6)dj  \ogp(x.  6)\  (43) 

The  positive-definite  matrix  g\f(0)  defines  a  Riemannian  metric  on  Sn,  giving  it  the  g-Riemannian 
structure. 

When  a  metric  tensor  gij(0)  is  transformed  to 


gij(G)  =  a(0)gij(G)  (44) 

by  a  positive  function  a(0),  we  call  it  a  conformal  transformation.  See,  e.g.,  [13-15,32].  The  conformal 
transformation  of  divergence  induces  that  of  the  Riemannian  metric. 
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Theorem  4  The  (/-Fisher  information  metric  is  given  by  a  conformal  transformation  of  the  Fisher 
information  metric  qf-  as 

uij 


M)ia\  —  Q  „F 


9^(0)  = 


am 


hq{0)  13 ' 

Proof  The  (/-metric  is  derived  from  the  Taylor  expansion  of  Dq[p  :  p  +  dp] .  We  have 

1 


(45) 


Dq  [p(x,G)  :  p(x,G  +  d0)\  = 


(1  ~q)hq(0) 
a 


i  -JP(x,8)w*,o  +  <iey-<dx 


hq(e)  l J  p(x,0) 


dip(x,  &)djp(x,  6)dx  >  dOld6:i 


(46) 


using  the  identities  (25)  and  (26).  When  q  =  1,  this  is  the  Fisher  information  given  by  (43).  Hence,  the 
(/-Fisher  information  is  given  by  (45).  □ 


A  Riemannian  metric  defines  the  length  of  a  tangent  vector  X  =  (X1,  •  •  • ,  Xn )  at  6  by 

Ml2  = 


(47) 


Similarly,  for  two  tangent  vectors  X  and  Y,  their  inner  product  is  defined  by 

{X,Y)  =  Y,9nxiYi  (48) 

When  (X,Y)  vanishes,  X  and  Y  are  said  to  be  orthogonal.  The  orthogonality,  or  more  generally 
the  angle,  of  two  vectors  X  and  Y  does  not  change  by  a  conformal  transformation,  although  their 
magnitudes  change. 


3.  Dually  Flat  Structure  of  (/-Exponential  Family 


3.1.  Legendre  Transformation  and  q- Entropy 

Given  a  convex  function  the  Legendre  transformation  is  defined  by 

r]  =  S7ij){0)  (49) 

where  V  =  (d/d6l)  is  the  gradient.  Since  the  correspondence  between  6  and  rj  is  one-to-one,  we  may 
consider  rj  as  another  coordinate  system  of  S. 

The  dual  potential  function  is  defined  by 

p{rj)  =  max  {6  •  r)  —  (50) 

0 

which  is  convex  with  respect  to  77.  The  original  coordinates  are  recovered  from  the  inverse  transformation 
given  by 

e  =  Vp(r])  (51) 

where  V  =  ( d/drji ),  so  that  6  and  77  are  in  dual  correspondence. 

The  following  theorem  gives  explicit  relations  among  these  quantities. 


Entropy  2011,  13 


1178 


Theorem  5  The  dual  coordinates  rj  are  given  by 


77  =  Ef,[x] 


(52) 


and  the  dual  potential  is  given  by 


<^(77)  = 


(53) 


1  ~q  {hq(p) 

Proof  The  relation  (52)  is  immediate  from  (29).  From  the  Legendre  duality,  the  dual  potential  satisfies 

99(77)  +  ip(G)  —  6  •  77  =  0  (54) 

when  6  and  77  correspond  to  each  other  by  77  =  'V'ip(G).  Therefore, 


i= 1 

=  Ep  [\ogq p(x,G)] 
1 

(1  -q)hq(0) 

1/1 


1  ~q  \hq(G) 


1  - 

- 1 


pg(x,  G)dx 


(55) 

(56) 

(57) 

(58) 


This  is  a  convex  function  of  77.  □ 


We  call  the  g-dual  potential 


99(77)  =  E  [\ogqp(x,0)]  = 


1  -  q  [h 


1 

- - 1 


(59) 


the  negative  g-entropy,  because  it  is  the  Legendre-dual  of  the  g-free  energy  v{6).  There  are  various 
definitions  of  g-entropy.  The  Tsallis  g-entropy  [3]  is  originally  defined  by 


H- 


Tsallis 


while  the  Renyi  g-entropy  [33]  is 


Rcnyi 


1-9 
1 


1  -  g 


{hq  -  1) 


log  hq 


(60) 


(61) 


They  are  mutually  related  by  monotone  functions.  When  q  — *  1,  all  of  them  reduce  to  the  Shannon 
entropy. 

Our  definition  of 

^Tsallis 


Hq=  - - 

1  —  q 


1-^  = 


hn 


(62) 


is  also  monotonically  connected  with  the  previous  ones,  but  is  more  natural  from  the  point  of  view 
of  g-geometry.  The  entropy  Hq  has  been  known  as  the  normalized  g-entropy,  which  was  studied  in 
[16,34-37], 
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3.2.  q-Dually  Flat  Structure 

There  are  two  dually  coupled  coordinate  systems  6  and  rj  in  (/-exponential  family  S  with  two  potential 
functions  ip(0)  and  ip{rj)  for  each  q.  Two  affine  structures  are  introduced  by  the  two  convex  functions  E 
and  p.  See  information  geometry  of  dually  flat  space  [8].  Although  S'  is  a  Riemannian  manifold  given  by 
the  g-Fisher  information  matrix  (45),  we  may  nevertheless  regard  S  as  an  affine  manifold  where  6  is  an 
affine  coordinate  system.  They  represent  intensive  quantities  of  a  physical  system.  Dually,  we  introduce 
a  dual  affine  structure  to  S',  where  rj  is  another  affine  coordinate  system.  They  represent  extensive 
quantities.  We  can  define  two  types  of  straight  lines  or  geodesics  in  S  due  to  the  g-affinc  structures. 

For  two  distributions  p{x,0f)  and  p(x,6f)  in  S',  a  curve  p(x,0(t))  is  said  to  be  a  g-geodesic 
connecting  them,  when 

0[t)  =  t.Q  i  +  (1  —  t)02  (63) 

where  t  is  the  parameter  of  the  curve.  Dually,  in  terms  of  dual  coordinates  r/,  when 

V(t)  =  trii  +  (1  -  t)v 2  (64) 

holds,  the  curve  is  said  to  be  a  dual  g-geodesic. 

More  generally,  the  g-geodesic  connecting  two  distribution  pfx)  and  pfx)  is  given  by 

log qP(x,t)  —  t\ogqPl(x)  +  (1  ~t)  log qp2(x)  ~  C(t ) 

where  c(t)  is  a  normalizing  term.  This  is  rewritten  as 

p(x,  tf~q  =  tpfxf~q  +  (1  -  t)pfxf~q  -  ft) 

Dually,  the  dual  g-geodesic  connecting  pfx)  and  pfx)  is  given  by  using  the  escort  distributions  as 

p(x,t)  =  tpfx)  +  (l-t)pfx)  (67) 

Since  the  manifold  S'  has  a  g-Riemannian  structure,  the  orthogonality  of  two  tangent  vectors  is 
defined  by  the  Riemannian  metric.  We  rewrite  the  orthogonality  of  two  geodesics  in  terms  of  the  affine 
coordinates.  Let  us  consider  two  small  deviations  d\p(x)  and  d2p{x)  of  p(x),  that  is,  from  p(x)  to 
p(x)  +  dip(x)  and  p(x)  +  d2p(x),  which  are  regarded  as  two  (infinitesimal)  tangent  vectors  of  S'  at  p(x). 

Lemma  1  The  inner  product  of  two  deviations  d.  \p  and  d2p  is  given  by 

(d1p(x),d2p(x))  =  I  d1p(x)d2logqp(x)dx 

Proof  By  simple  calculations,  we  have 

[ <hi>(x)d2\og.p(x)dx  =  4 
J  q  hqJ  p(x) 

of  which  the  right-hand  side  is  the  Riemannian  inner  product  in  the  form  of  (46).  □ 

Corollary.  Two  curves  01(t)  and  rj2(t),  intersecting  at  t  =  0,  are  orthogonal  when  (^i(O),  f/2(0))  =  0. 
Here,  0\{t)  and  rift)  denote  derivatives  of  Oft)  and  rift)  by  t,  respectively. 

The  two  geodesics  and  the  orthogonality  play  a  fundamental  role  in  S'  as  will  be  seen  in  the  following. 


(68) 


(69) 


(65) 

(66) 
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4.  g-Pythagorean  and  g-Max-Ent  Theorems 

A  dually  flat  Riemannian  manifold  admits  the  generalized  Pythagorean  theorem  and  the  related 
projection  theorem  [8].  We  state  them  in  our  case. 

g-Pythagorean  Theorem.  For  three  distributions  pi(x),p2(x)  and  p3(x)  in  S,  it  holds  that 

Dq  [pi  :  p2]  +  Dq\p2:  p3 }  =  Dq  \px  :  p3]  (70) 

when  the  dual  geodesic  connecting  p\  (x)  and  p2(x)  is  orthogonal  at  p2{x)  to  the  geodesic  connecting 
p2(x)  and  p3{x)  (see  Figure  1). 

Figure  1.  g-Pythagorean  theorem. 


Given  a  distribution  p(x)  G  S  and  a  submanifold  M  C  S,  a  distribution  r(x  )  G  M  is  said  to  be  the 
g-projection  (dual  g-projection)  of  p(x)  to  M,  when  the  g-geodesic  (dual  g-geodesic)  connecting  p(x) 
and  r(x)  is  orthogonal  to  M  at  r(x)  (Figure  2). 


Figure  2.  g-projection  of  p  to  M. 


g-Projection  Theorem.  Let  M  be  a  submanifold  of  S.  Given  p(x)  e  S,  the  point  r(x)  e  M  that 
minimizes  Dq[p{x )  :  r(x)\  is  given  by  the  dual  g-projection  of  p(x)  to  M.  The  point  r(x )  G  M  that 
minimizes  Dq[r(x )  :  p(x )]  is  given  by  the  g-projection  of  p(x)  to  M. 


Entropy  2011,  13 


1181 


We  show  that  the  well-known  g-max-ent  theorem  in  the  case  of  Tsallis  g-entropy  [1,4, 9,1 1]  is  a  direct 
consequence  of  the  above  g-Pythagorean  and  g-projection  theorems. 

g-Max-Ent  Theorem.  Probability  distributions  maximizing  the  g-entropies  i^Tsaiiis,  7/ Renyi  and  Hq 
under  g-linear  constraints  for  m  random  variables  ck(x)  and  various  values  of  ak 

Ep[ck(x)]  =  ak,  k  =  1,  ■  ■  ■  ,m  (71) 


form  a  g-exponential  family 

m 

log  qp(x,0)  =  '^2etci(x)-iJ;(0)  (72) 

i=  1 

The  proof  is  easily  obtained  by  the  standard  analytical  method.  Here,  we  give  a  geometrical  proof. 
Let  us  consider  the  subspace  M*  C  S  whose  member  p{x)  satisfies  the  m  constraints 

Ep  [ck{x)\  =  J p{x)ck(x)dx  =  ak,  k  =  l,---,m.  (73) 

Since  the  constraints  are  linear  in  the  dual  affine  coordinates  rj  or  p(x),  M*  is  a  linear  subspace  of  S 
with  respect  to  the  dual  affine  connection.  Let  po(x,  60)  be  the  uniform  distribution  defined  by  6{)  =  0, 
which  implies  Po(x,  00)  =  const  from  (6).  Let  p(x)  e  M*  be  the  g-projectionofp0(x)  to  M*  (Figure  3). 
Then,  the  divergence  Dq  [p  :  p0]  from  p(x)  e  M*  to  p0(x)  is  decomposed  as 

Dq  [p  :  Po]  =  Dq\p:p}  +  Dq  [p  :  p0 }  (74) 


Let  r]p  be  the  dual  coordinates  ol'p(x).  Since  the  divergence  is  written  as 

Dq  [p  :p0]=ip  (0o)  +  <P  (%)  ~  ■  r]p 


(75) 


the  minimizer  of  Dq  [p  :  p0]  among  p{x)  e  M*  is  just  p(x),  which  is  also  the  maximizer  of  the  entropy 

-vM- 

The  trajectories  of  p(x)  for  various  values  of  ak  form  a  flat  subspace  orthogonal  to  M*,  implying  that 
they  form  a  g-exponential  family  of  the  form  (6)  (see  Figure  3).  The  tangent  directions  dp(x)  of  M* 
satisfies 

J  dp(x)ck(x)dx  =  0,  k  =  l,---,m.  (76) 

Hence,  a  g-exponential  family  of  the  form 

m 

log qp(x,£)  =  ^2^dt(x)  -  (77) 

i=  1 


is  orthogonal  to  M* ,  when 


J  dp(x)d\ogqp(x,£)dx  —  0 


(78) 


This  implies  that  di(x)  =  Ci(x).  Hence,  we  have  the  g-exponential  family  (72)  that  maximizes  the 
g-entropies. 
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Figure  3.  g-Max-Ent  theorem. 


5.  g-Bayesian  MAP  Estimator 


Given  N  iid  observations  Xi,  ■  ■  ■  ,Xn  from  a  statistical  model  M  =  {p(x,  £)},  we  have 

N 

p(xi,---,xN,g)  =  (79) 

i=  1 

Since  logg  u  is  a  monotonically  increasing  function,  the  maximizer  of  the  g-likelihood 

lq(x  =logqp(x1,---,xN,£)  (80) 


is  the  same  as  the  ordinary  maximum  likelihood  estimator  (mle).  However,  the  maximizer  of  the  g-escort 
distribution  that  maximizes  the  g-escort  log-likelihood, 


-1{xu---,xn,£)  =  logp(a;i,  ■  ■  ■ ,  xN,  £)  -  -  log  hq(£) 
q  q 


(81) 


is  different  from  this.  We  show  that  the  g-mle  is  a  Bayesian  MAP  (maximum  a  posteriori  probability) 
estimator.  This  clarifies  the  meaning  of  the  g-escort  mle. 

The  g-escort  mle  is  the  maximizer  of  the  g-escort  distribution, 


kn  =  argma xp(x1,  -  ■  ■  ,xN,£) 


(82) 


Theorem  6  The  g-escort  mle  £  is  the  Bayesian  MAP  estimator  with  the  prior  distribution 


<0  =  h,(S)-N/’ 


(83) 


Proof  The  Bayesian  MAP  is  the  maximizer  of  the  posterior  distribution  with  prior  7t(£) 


V  (£|#i)  ■  ■  ■  j  xN) 


p{x  1,  •  •  ■  ,XN) 


(84) 


Entropy  2011,  13 


1183 


which  also  maximizes 


M£)p(>i,---,:EAr,£))9,  for  g  >  0 

On  the  other  hand,  the  g-escort  mle  is  the  maximizer  of 


(85) 


(86) 


Hence,  when 


7 r 


«)  =  K($yN/q 


(87) 


the  two  estimators  are  identical.  □ 

The  theorem  shows  that  the  Bayesian  prior  has  a  peak  at  the  maximizer  of  our  g-entropy  II q. 

6.  Conclusions 

Much  attention  has  been  recently  paid  to  the  probability  distributions  subject  to  the  power  law,  instead 
of  the  exponential  law,  since  Tsallis  proposed  the  g-entropy  and  related  theories.  The  power  law  is  also 
found  in  various  communication  networks.  It  is  now  a  hot  topic  of  research. 

However,  we  do  not  have  a  geometrical  foundation  while  that  for  the  ordinary  family  of  probability 
distributions  is  given  by  information  geometry  [8].  The  present  paper  tried  to  give  a  geometrical 
foundation  to  the  g-family  of  probability  distributions.  We  introduced  a  new  notion  of  the  g-geometry. 
The  g-structure  is  ubiquitous  in  the  sense  that  the  family  of  all  the  discrete  probability  distributions  (and 
the  family  of  all  the  continuous  probability  distributions,  if  we  neglect  delicate  problems  involved  in 
the  infinite  dimensionality)  belongs  to  the  g-exponential  family  of  distributions  for  any  g.  That  is,  we 
can  introduce  the  g-geometrical  structure  to  an  arbitrary  family  of  probability  distributions,  because  any 
parametrized  family  of  probability  distributions  forms  a  submanifold  embedded  in  the  entire  manifold. 

The  g-structure  consists  of  a  Riemannian  metric  together  with  a  pair  of  dually  coupled  affine 
connections,  which  sits  in  the  framework  of  the  standard  information  geometry.  However,  the  g-structure 
is  essentially  different  from  the  standard  one  derived  by  the  invariance  criterion  of  the  manifold  of 
probability  distributions.  We  have  a  novel  look  on  the  theory  related  to  the  g-entropy  from  a  viewpoint  of 
conformal  transformation.  This  leads  us  to  unified  definitions  of  various  quantities  such  as  the  g-entropy, 
g-divergence,  g-potential  function  and  their  duals,  as  well  as  new  interpretations  of  known  quantities. 

This  is  a  geometrical  foundation  and  we  expect  that  the  paper  contributes  to  provide  further 
developments  in  this  field. 
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Conformal  geometry  of  escort  probabilities  and 
its  application  to  Voronoi  partitions 

A.  Oharaf.  H.  Matsuzoe.  S.  Amari 

fOsaka  University 

Escort  probability  is  naturally  induced  from  researches  of 
multifractals  [1]  and  nonextensive  statistical  mechanics  [2]  to 
play  an  important  but  mysterious  role.  Testing  its  utility  in 
the  other  scientific  fields  would  greatly  help  our  understanding 
about  it.  This  motivates  us  to  approach  the  escort  probability 
by  geometrically  studying  its  role  in  information  science.  The 
first  purpose  of  this  presentation  is  to  investigate  the  escort 
probability  from  viewpoints  of  information  geometry  [3]  and 
affine  differential  geometry  [4].  The  second  is  to  show  that 
escort  probability  with  information  geometric  structure  is 
useful  to  construction  of  Voronoi  partitions  (or  diagrams) 
[5]  on  the  space  of  probability  distributions.  Recently,  it  is 
reported  [6]  that  alpha-geometry,  which  is  an  information 
geometric  structure  of  constant  curvature,  has  a  close  relation 
with  Tsallis  statistics  (2).  The  remarkable  feature  of  the 
alpha-geometry  consists  of  the  Fisher  metric  together  with  a 
one  parameter  family  of  dual  affine  connections,  called  the 
alpha-connections.  We  prove  that  the  manifold  of  escort 
probability  distributions  is  dually  flat  by  considering  confor¬ 
mal  transformations  that  flatten  the  alpha-geometry  on  the 
manifold  of  usual  probability  distributions.  On  the  resultant 
manifold,  escort  probabilities  consist  of  an  affine  coordinate 
system.  The  result  gives  us  a  clear  geometrical  interpretation 
of  the  escort  probability,  and  simultaneously,  produces  its  new 
obscure  links  to  conformality  and  project  ivity.  Due  to  these 
two  geometrical  concepts,  however,  the  obtained  dually  flat 
structure  inherits  several  properties  of  the  alpha-geometry. 
The  dually  flatness  proves  crucial  to  construction  of  Voronoi 
partitions  for  alpha-divergences,  which  we  shall  call  alpha- 
Voronoi  partitions.  The  Voronoi  partitions  on  the  space 
of  probability  distributions  with  the  Kullback-Leibler,  or 
Bregman  divergences  have  been  recognized  as  important  tools 
for  various  statistical  modeling  problems  involving  pattern 
classification,  clustering,  likelihood  ratio  test  and  so  on.  The 
largest  advantage  to  take  account  of  alpha-divergences  is  their 
invariance  under  transformations  by  sufficient  statistics  (See 
also  [3]  in  a  different  viewpoint),  which  is  a  significant  re¬ 
quirement  for  those  statistical  applications.  In  computational 
aspect,  the  conformal  flattening  of  the  alpha-geometry  enables 
us  to  invoke  the  standard  algorithm  [5]  using  a  potential 
function  and  an  upper  envelop  of  hyperplanes  with  the  escort 
probabilities  as  coordinates. 

[1]  C.  Beck  and  F.  Schlogl,  Thermodynamics  of  Chaotic  Sys¬ 
tems,  (Cambridge  University  Press.  1993). 

[2]  C.  Tsallis,  Introduction  to  Nonextensive  Statistical  Me¬ 
chanics:  Approaching  a  Complex  World,  (Berlin/ Heidelberg: 
Springer.  2009). 

[3]  S.-l.  Amari  and  II.  Nagaoka,  Methods  of  Information  Ge¬ 
ometry,  (Rhode  Island:  AMS&rOxford,  2000). 

[4]  K.  Nomizu  and  T.  Sasaki,  Affine  Differential  Geometry. 
(Cambridge  University,  1993). 

[5]  H.  Edelsbrunner,  Algorithms  in  Combinatorial  Geometry, 
(Springer-Verlag,  1987). 

[6]  A.  Ohara  A.  Phys.  Lett.  A  370  184  (2007);  Euro.  Phys. 
J.  B  70  15  (20009). 


Characteristics  of  bubble  in  house  price  distribu¬ 
tion  of  Japan 

T.  Ohnishit,  T.  Mizuno.  C.  Shimizu.  T.  Watanabe 

fCanon  Institute  for  Global  Studies  and  University  of  Tokyo 

We  empirically  investigate  the  house  price  distributions  in 
the  Greater  Tokyo  Area  by  using  a  housing  information  which 
is  published  on  a  weekly  basis  by  Recruit  Co.,  Ltd..  This 
dataset  contains  individual  listings  of  724,4 1G  condominiums 
from  198G  to  ‘2009  including  the  period  of  housing  bubble. 
The  attributes  of  a  house  are  also  included  such  as  its  size, 
location,  age.  and  so  on.  This  dataset  covers  more  than 
95  percent  of  the  entire  transactions  in  the  central  part  of 
Tokyo  (the  23  special  wards  of  Tokyo).  We  find  that  the 
cross-sectional  distribution  of  house  prices  has  a  fat  upper  tail, 
and  the  tail  part  is  close  to  that  of  a  power  law  distribution 
with  exponent  73.  On  the  other  hand,  the  cross-sectional 
distribution  of  house  sizes  measured  in  terms  of  floor  space 
has  less  fat  tails  than  the  price  distribution  and  is  close  to 
an  exponent  ial  distribution  with  mean  25  square  meters.  We 
also  find  a  positive  linear  relationship  between  the  log  price 
of  a  house  and  its  size.  An  increase  in  the  house  size  by  a 
square  meter  leads  to  a  1.3  percent  increase  in  the  house 
price.  We  construct  a  size-adjusted  price  by  subtracting  the 
house  size  (multiplied  by  a  positive  coefficient)  from  the  log 
price,  which  is  consistent  with  these  findings.  We  find  that 
the  size-adjusted  price  follows  a  lognormal  distribution  except 
for  the  period  of  the  asset  bubble  and  its  collapse  in  Tokyo 
for  which  the  price  distribution  remains  asymmetric  and 
skewed  to  the  right  even  after  controlling  for  the  size  effect. 
As  for  the  period  of  the  bubble  and  its  collapse,  we  find  some 
evidence  that  the  sharp  price  movements  were  concentrated 
in  particular  areas,  and  this  spatial  heterogeneity  is  the 
source  of  the  fat  upper  tail.  These  findings  show  that  the 
cross-sectional  distribution  of  size-adjusted  prices  is  very  close 
to  a  lognormal  distribution  during  regular  times  but  deviated 
substantially  from  a  lognormal  during  the  bubble  period.  This 
suggests  that  the  shape  of  the  size-adjusted  price  distribution, 
especially  the  shape  of  the  tail  part,  may  contain  information 
useful  for  the  detection  of  housing  bubbles.  That  is,  the 
presence  of  a  bubble  can  be  safely  ruled  out  if  recent  price 
observations  are  found  to  follow  a  lognormal  distribution.  On 
the  other  hand,  if  there  are  many  outliers,  especially  near 
the  upper  tail,  this  may  indicate  the  presence  of  a  bubble, 
since  such  price  observations  are  very  unlikely  to  occur  if  they 
follow  a  lognormal  distribution.  This  method  of  identifying 
bubbles  is  quite  different  from  conventional  ones  based  on 
aggregate  measures  of  housing  prices,  and  therefore  should  be 
a  useful  tool  to  supplement  exist  ing  methods. 
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Dually  flat  structure  with  escort  probability  and  its 
application  to  alpha- Voronoi  diagrams^ 
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Abstract.  This  paper  studies  geometrical  structure  of  the  manifold  of  escort 
probability  distributions  and  shows  its  new  applicability  to  information  science.  In 
order  to  realize  escort  probabilities  we  use  a  conformal  transformation  that  flattens 
so-called  alpha-geometry  of  the  space  of  discrete  probability  distributions,  which  well 
characterizes  nonadditive  statistics  on  the  space.  As  a  result  escort  probabilities  are 
proved  to  be  flat  coordinates  of  the  usual  probabilities  for  the  derived  dually  flat 
structure.  Finally,  we  demonstrate  that  escort  probabilities  with  the  new  structure 
admits  a  simple  algorithm  to  compute  Voronoi  diagrams  and  centroids  with  respect  to 
alpha-divergences. 


PACS  numbers:  05.90.+m,  89.70.Cf,  02.40.Hw 


|  Several  results  in  this  paper  can  be  found  in  the  conference  paper  [36]  without  complete  proofs. 
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Escort  probability  is  naturally  induced  from  researches  of  multifractals  [1]  and  non- 
extensive  statistical  mechanics  [2]  to  play  an  important  but  mysterious  role.  Testing 
its  utility  in  the  other  scientific  fields  would  greatly  help  our  understanding  about  it. 
This  motivates  us  to  approach  the  escort  probability  by  geometrically  studying  its  role 
in  information  science. 

The  first  purpose  of  this  paper  is  to  investigate  the  escort  probability  from 
viewpoints  of  information  geometry  [3,  4]  and  affine  differential  geometry  [5].  The 
second  is  to  show  that  escort  probability  with  information  geometric  structure  is  useful 
to  construction  of  Voronoi  diagrams  [6]  on  the  space  of  probability  distributions. 

Recently,  it  is  reported  [7,  8]  that  a-geometry,  which  is  an  information  geometric 
structure  of  constant  curvature,  has  a  close  relation  with  Tsallis  statistics  [2],  The 
remarkable  feature  of  the  a-geometry  consists  of  the  Fisher  metric  together  with  a  one- 
parameter  family  of  dual  affine  connections,  called  the  a-connections. 

We  prove  that  the  manifold  of  escort  probability  distributions  is  dually  flat  by 
considering  conformal  transformations  that  flatten  the  a-geometry  on  the  manifold  of 
usual  probability  distributions.  On  the  resultant  manifold,  escort  probabilities  consist 
of  an  affine  coordinate  system.  See  also  [9]  for  another  type  of  flattening  a  curved  dual 
manifold  by  a  conformal  transformation. 

The  result  gives  us  a  clear  geometrical  interpretation  of  the  escort  probability,  and 
simultaneously,  produces  its  new  obscure  links  to  conformality  and  projectivity.  Due 
to  these  two  geometrical  concepts,  however,  the  obtained  dually  flat  structure  inherits 
several  properties  of  the  a-geometry. 

The  dually  flatness  proves  crucial  to  construction  of  Voronoi  diagrams  for  a- 
divergences,  which  we  shall  call  a-  Voronoi  diagrams.  The  Voronoi  diagrams  on  the  space 
of  probability  distributions  with  the  Kullback-Leibler  [10,  11],  or  Bregman  divergences 
[12]  have  been  recognized  as  important  tools  for  various  statistical  modeling  problems 
involving  pattern  classification,  clustering,  likelihood  ratio  test  and  so  on.  See  also,  e.g., 
[13,  14,  15]  for  related  problems. 

The  largest  advantage  to  take  account  of  a-divergences  is  their  invariance  under 
transformations  by  sufficient  statistics  [16]  (See  also  [4]  in  a  different  viewpoint),  which 
is  a  significant  requirement  for  those  statistical  applications.  In  computational  aspect, 
the  conformal  flattening  of  the  a-geometry  enables  us  to  invoke  the  standard  algorithm 
[29,  6]  using  a  potential  function  and  an  upper  envelop  of  hyperplanes  with  the  escort 
probabilities  as  coordinates. 

Section  2  is  devoted  to  preliminaries  for  a-geometry  in  the  light  of  affine  differential 
geometry.  In  section  3,  as  a  main  result,  we  consider  conformal  transformations  and 
discuss  properties  of  the  obtained  dually  flat  structure.  Dual  pairs  of  potential  functions 
and  affine  coordinate  systems  on  the  manifold  are  explicitly  identified,  and  the  associated 
canonical  divergence  is  shown  to  be  conformal  to  the  a-divergence.  Section  4  describes 
an  application  of  such  a  flattened  geometric  structure  to  a- Voronoi  diagrams  on  the 
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probability  simplex.  The  properties  and  a  construction  algorithm  are  discussed.  Further, 
a  formula  for  a-centroid  is  touched  upon. 

In  the  sequel,  we  fix  the  relations  of  two  parameters  q  and  a  as  q  =  (1  —  a)/2,  and 
restrict  q  >  0. 


2.  Preliminaries 


We  briefly  introduce  a-geometry  via  affine  differential  geometry.  See  for  details  [7,  8]. 
Let  Sn  denote  the  n-dimensional  probability  simplex,  i.e., 


Sn  :=  \  p=  ( Pi 


n+ 1  'i 

Pi  >0,  Y,Pi=  If,  (!) 

i= 1  J 

and  Pi,i  =  1,  •••,«,  +  1  denote  probabilities  of  n  +  1  states.  We  introduce  the  a- 
geometric  structure  on  Sn.  Let  {di},i  =  1,  •  •  •  ,n  be  natural  basis  tangent  vector  fields 
on  Sn  dehned  by 


dt  :  = 


d 


d 


-,  i  =  1, 


(2) 


dpi  dpn+ i 

where  pn+  \  =  1  —Yfi=\  Pi-  Now  we  define  a  Riemannian  metric  g  on  Sn  called  the  Fisher 
metric. 

9ij (p)  ■=  9{di,  dj)  =  —Sij  +  (3) 


Pi 


Pn+ 1 


?i+l 


=  J2Pk(dil°gPk)(djlogPk),  i,j  =  n. 


k= i 


Further,  define  an  torsion- free  affine  connection  called  the  a-connection,  which  is 
represented  in  its  coefficients  by 

r  ifk(p)  =  +  Pkfhj^J  ,  i,j,k  =  1,  •  •  ■ ,  n,  (4) 

where  Sf3  is  equal  to  one  if  i  —  j  —  k  and  zero  otherwise.  Then  we  have  the  a-covariant 
derivative  which  gives 

n 

v^  =  E  r«W 

k= 1 

when  it  is  applied  to  the  vector  fields  dt  and  dj. 

There  are  two  specific  features  for  the  a-geometry  on  Sn  defined  in  such  a  way.  First, 
the  triple  (Sn,g,  V^)  is  a  statistical  manifold  [17]  (See  appendix  A  for  its  definition), 
i.e.,  we  can  confirm  that  the  following  relation  holds: 

Xg(Y,Z)=g(VipY,Z)+g(Y,Vixa)Z),  X,  Y,  Z  6  X(S"),  (5) 


where  X(Sn)  denotes  the  set  of  all  tangent  vector  fields  on  Sn .  Two  statistical  manifolds 
(Sn,g,  V^)  and  (5",  g,  V(_a^)  are  said  mutually  dual. 

The  other  is  that  («! Sn,g,'V^)  is  a  manifold  of  constant  curvature  n  —  (1  —  a2)/4, 

i.e., 


R{a)(X ,  Y)Z  =  n{g(Y,  Z)X  -  g(X,  Z)Y}, 
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where  R ^  is  the  curvature  tensor  with  respect  to  V*^.  From  this  property  the  well- 
known  nonadditive  formula  of  the  Tsallis  entropy  can  be  derived  [7]. 

In  [8]  we  have  discussed  the  a-geometry  on  Sn  from  a  viewpoint  of  affine  differential 
geometry  [5].  Consider  the  immersion  /  of  Sn  into  R“+1  by 

/  :  P=  (Pi)  ^  x  =  (xl)  =  (L(“}(pi)),  i  —  1, •  *  ■  >n  +  1,  (6) 


where  (xl),i  =  1 ,  -  •  * ,  n+  1  is  the  canonical  flat  coordinate  system  of  Rn+1  and  the 
function  L is  dehned  by 


L(a\t)  := 


_ fl-a)/2  _ 

a  q 


Note  that  f(Sn )  is  a  level  hypersurface  in  the  ambient  space  R+1  represented  by 
^(cc)  =  2/(1  +  a),  where 


\]/(cc)  :  = 


“  +  1  S' 


n+l 

E 


a 


-x 


2/(1  -a) 


n+l 


EH1'’. 

i=  1 


We  choose  a  transversal  vector  f  on  the  level  hypersurface  by 


(7) 


n+1  8  .  . 

V  =  -q(l-q)xt  =  -KXt.  (8) 

Then  we  can  confirm  that  the  affine  immersion  (/,  £)  realizes  the  a-geometry  on  Sn  [8]. 
Hence,  it  would  be  possible  to  develop  theory  of  the  a+geometry  and  Tsallis  statistics 
with  ideas  of  affine  differential  geometry  [18]. 

Further,  the  escort  probability  [1]  naturally  appears  in  this  setup.  The  escort 
probability  P  =  (Pf)  associated  with  p  =  (pf)  is  the  normalized  version  of  (Pi)q,  and  is 
dehned  by 


PM  == 


=  n+ 1,  Zq(p)  :=  £X(P)>  x{p)  e  f{Sn).  (9) 


n+l 


Hence,  the  simplex  £n  in  the  ambient  space  R”+1,  he., 


£n  :=  <x  =  (xl) 


n+l 

J2xi  =  1,  Xi  >0 

i= 1 


represents  the  set  of  escort  distributions  P. 

Note  that  the  element  x*  =  (x*)  in  the  dual  space  of  Rra+1  dehned  by 

1 


x 


Up)  :=  L^\Pi)  = 


i -q 


(. Pi 


P~q 


i  =  1,  -  ■  ■  ,n  +  1, 


meets 

x*(p)  = 

Hence,  it  satishes  [8] 

n+l  n+l 

-  E  C(pK(p)  =  1,  E  =  0. 

i= 1  i=  1 

for  an  arbitrary  vector  X  =  Yfl=i  Xld/dxl  at  x(p)  tangent  to  /(<Sn).  Thus, 
can  be  interpreted  as  the  conormal  map  [5]. 


(10) 

-x*(p) 
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3.  A  conformally  and  projectively  flat  geometric  structure  and  escort 
probabilities 


In  this  section  we  show  a  main  result.  For  this  purpose,  we  consider  a  conformal  and 
projective  transformation  [19,  20,  21,  22]  of  the  o-geometry  to  introduce  a  dually  flat 
one.  This  flattening  of  the  o-geometry  conserves  some  of  its  properties.  The  escort 
probabilities  (P*)  are  found  to  represent  one  of  mutually  dual  affine  coordinate  systems 
in  the  induced  geometry.  While  the  many  functions  or  geometric  quantities  introduced 
in  this  section  depend  on  the  parameter  a  or  q,  we  omit  them  for  the  brevity. 

Let  us  define  a  function  A  on  Sn  by 

X(p)  :=Yq  =  e 

which  depends  on  a.  Then,  from  (9)  £n  is  regarded  as  the  image  of  Sn  for  another 
immersion  /  :=  A/,  i.e., 

/  :  S"  3  ( Pi )  !—>•  (Pj)  G  £n,  i  =  1,  •  •  • ,  n  +  1, 


and  (Pi,  •  •  • ,  Pn)  is  interpreted  as  another  coordinate  system  of  Sn.  Note  that  the  inverse 
mapping  f~1  is  well-defined  by 


/  1  :  (pi)  H-  (pj) 


(  ( P \ 


i  —  1,  •  •  • ,  n  +  1. 


It  would  be  a  natural  way  to  introduce  geometric  structure  on  £n  (and  hence  on  Sn ) 
via  the  affine  immersion  (/,  £)  by  taking  a  suitable  transversal  vector  £,  similarly  to  the 
case  of  the  a-geometry  mentioned  above.  Since  £n  is  a  part  of  a  hyperplane  in  Rn+1, 
the  canonical  affine  connection  of  Rn+1  induces  a  flat  connection,  denoted  by  D^\  on 
£n.  However,  for  the  same  reason,  we  cannot  define  a  Riemannian  metric  in  this  way§ 
because  it  vanishes  on  £n ,  regardless  of  any  choice  of  the  transversal  vector  £. 

The  idea  we  adopt  here  is  to  define  a  Riemannian  metric  by  utilizing  a  property 
of  (Sn,  g,  V^d)  called  — 1  -conformal  flatness.  Based  on  the  results  proved  by  Kurose 
[19,  20],  we  conclude  that  the  manifold  ( Sn ,  g,  V("^)  is  ±l-conformally  flat  (See  Appendix 
A  for  its  definition)  because  it  is  a  statistical  manifold  of  constant  curvature. 

Actually,  let  V*  be  the  flat  connection||  on  Sn  defined  with  and  the  differential 
f  *  by 


f.(VxY)  =  DfJ.Y,  X,  Y  €  X( S”). 

Then,  we  can  prove  that  V(cb  and  V*  are  projectively  equivalent  [5],  i.e.,  it  holds  that 
V*XY  =  V{x]Y  +  d( In  A)(y)A  +  d(ln  A)(A)F,  X,  Y  G  X(Sn).  (11) 
Hence,  if  we  define  another  Riemannian  metric  h  on  Sn  by 


h(X,  Y)  :=  A g(X,  Y),  X,  Y  G  X(Sn), 


(12) 


§  In  affine  differential  geometry,  a  Riemannian  metric  is  realized  as  the  affine  fundamental  form  of  an 
affine  immersion  [5]. 

||  For  the  sake  of  notational  consistency  with  the  existing  literature,  e.g.,  [3,  4],  we  first  define  V*,  and 
later  V  as  the  dual  of  V*. 
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then,  (Sn,g,  V^)  is  —  1-conformally  equivalent  to  (Sn,h,W*)  equipped  with  a  flat 
connection  V*.  Further,  the  manifold  (Sn,h,  V*)  can  be  proved  to  be  a  statistical 
manifold  (See  Appendix  B). 

Using  the  conormal  map  —x*(p),  we  can  define  the  a-divergence  as  a  contrast 
function  (See  Appendix  A)  inducing  (g,  V(“\  V^~a^)  as  follows  [20]: 


n+ 1 

D{a)(p,r)  =  -  Y,x*(r)(xl(p)  - x'(r )) 

i= 1 

=  (- x*(r ),  *(p)  -  *(»*))  =  --  (x*(r),  ®(p)). 

The  statistical  manifolds  (<Sn,  g,  V*-"-1)  and  (5n,  g,  V^)  are  dual  in  the  sense  of 
(5).  Further,  it  is  known  [4]  that  there  exists  the  unique  affine  flat  connection  V  on 
Sn,  dual  with  respect  to  (h,  V*).  Then,  according  to  [20],  it  is  proved  that  ( Sn ,  h,  V)  is 
1-conformally  equivalent  to  (Sn,  g,  V^_“))  and  a  contrast  function  p  inducing  (h,  V,  V*) 
is  given  by  scaling  D Ca)  (See  Appendix  A)  as  follows: 


p(p,r)  =  A (r)D(  a)(p,r)  =  a\p,r ) 

1 


Zq(r ) 


Zq(r ; 


(-*(»•), ®*(p)  -  x*(r ))  =  (-p(r)? ®*(p)  -  **(»•))■  (13) 


We  shall  call  p  a  conformal  divergence. 

Now,  since  (5”,  /i,  V,  V*)  is  a  dually  flat  space,  the  standard  result  in  [3,  4]  suggests 
that  there  exist  mutually  dual  affine  coordinate  systems  (61,  ■  •  • ,  9n )  and  (r/i ,  •  •  • ,  r/n),  a 
potential  function  'if  [6)  and  its  conjugate  ^{rj)  satisfying 


dif  dif *  . 

,h  =  W’  e=w  ,  = 


(14) 


They  completely  determine  dually  flat  structure,  i.e.,  the  coefficients  of  h,  V  and  V* 
are  derived  as  the  second  and  third  derivatives  of  if  or  if*,  for  example, 


h;  =  h(4-,4r)  = 


r ijk  h  v 


(W  ’  00  j  t 

0 


o2if 

Woqp 


h<i  =  h(AA-) 

\  dr)i  ’  drjj  )  dry  dp  j  ? 


0 


aw-  OQi  ’  00k 


=  0, 


rijk  =  h 


0  0 


d3if 


i^OOPdOk  dOWdO*' 


and  so  on.  In  order  to  identify  if,  if*,  O'1  and  r y  explicitly  without  integrating  htJ  or 
1%T ,  we  shall  search  for  them  by  examining  whether  the  conformal  divergence  p  can  be 
represented  in  the  form  of  the  canonical  divergence  [4],  i.e., 


p(p,  r)  =  if{0(p))  +  tf*(rf(r))  -  J2  0*(p)Vi{r)-  (15) 

i=  1 

with  the  constraints  (14).  If  this  is  possible,  we  can  directly  prove  from  (A. 4)  and  (A. 5) 
that  the  obtained  if,if*,{01,---,  6n)  and  (rft,  ■  •  • ,  r/n)  are  pairs  of  dual  potential  functions 
and  affine  coordinate  systems  associated  with  ( Sn ,  h,  V,  V*). 

Before  showing  the  result,  we  define,  for  0  <  q  with  q  ^  1,  two  functions  by 

In q(s)  :=  S  1_  \  s  >  0,  exp q(t)  :=  [1  +  (1  -  q)t]+(1~q) ,  t  G  R, 
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where  [t]+  :=  max{0,t},  and  the  so-called  Tsallis  entropy  [23]  by 
S»(p)  -  j  _  q  ■ 

Note  that  s  =  expg(ln9(s))  holds  and  they  respectively  recover  the  usual  logarithmic, 
exponential  function  and  the  Boltzmann- Gibbs-Shannon  entropy  —  Yff=\  Pi  hipj  when 
q  — *  1.  For  q  >  0,  ln9(s)  is  concave  on  s  >  0. 

Theorem  1  For  the  dually  flat  space  (Sn,h,'V,'V*)  defined  via  El-conformal 
transformation  from  (Sn,  g,  V^K  V1-^),  the  associated  potential  functions  if, if*,  and 
dually  flat  affine  coordinate  systems  (d1,  •••+")  and  (r)i,  •  •  * ,  r}n)  are  represented  as 
follows: 

0i{p)  =  x*(p)-x*n+l(p),  i  —  1,  •  •  • ,  n 

ViiP)  =  pi(p),  i  =  1,  •••,'«- 
+#+))  =  -ln,(p„+i), 

1  1  (n+1  \  9  1 

^*(vip))  =  -  (Hp)  -q)  =  -  ^2iVi)1/q  -  v — , 

^  1  -  q  \i=i  J  1  -  q 

where  n  —  (1  —  a2)/4  =  q{  1  —  q)  is  the  scalar  curvature  of  {Sn,  g,  V1-^,  and 

r)n+ 1  :=  Pn+1(p)  =  1  —  Pi{p) ■  Further,  the  coordinate  systems  (91 ,  -  ■  ■  ,9n)  and 
(rji,  ■  ■  ■ ,  rjn )  are  V-  and  V* -affine,  respectively. 

Proof)  As  is  mentioned  above  we  have  only  to  check  that  the  potential  functions  if,  if* 
and  dual  affine  coordinates  9l ,  ry  in  the  statement  satisfy  (14)  and  (15)  for  the  conformal 
divergence  p.  First,  substitute  them  directly  to  the  right-hand  side  of  (15)  and  modify 
it  caring  for  the  relation  gn+\  =  1  —  E”=  i  Vi,  then  we  see  that  it  coincides  with  p(p,  r )  in 
(13).  Next,  since  it  holds  that  In flpfl  =  x*{p)  —  1  /  {1  —  q) ,  we  can  alternatively  represent 

0*(p)  =  In qiPi )  -  lng(p„+i)  =  In q(pi)  +  if(0(p))>  i  =  l,---,n. 

Hence,  for  9n+1  =  0  it  holds 

n+1  n+1 

1  =  Y,Pi  =  -+)• 

i= 1  i=  1 


Differentiating  the  both  sides  by  9fl  j  —  1,  •  •  • ,  n,  we  have 

d$i  w,>  (Pl>  m> 


° = E  (H  -  +  (p<r  =  (ft)»  -  Hew. 

Z=1 


3  =  1,  •  •  • ,  n. 


i=  1 


Thus,  the  left  equation  of  (14)  holds.  Finally,  note  that  the  conformal  factor  is 
represented  by 

1  q  Q 


Mp)  = 


Zq(p)  E  Ui\Pi)q  (exp  q(Sq(pW~T 


Using  the  formula  [24]: 

exP ,(S,(p))  =  exp,  (  Si(P, 


(16) 
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we  see  that 

A(p)  =  <1  (expi  =  q  PsVi)’)  • 

Hence,  the  second  equality  in  the  expression  of  'if*  holds.  The  right  equation  of  (14) 
follows  if  you  again  recall  ?/n+1  =  1  —  E”=i  rh-  Q.E.D. 

Corollary  1  The  escort  probabilities  Pi,  i  —  1,  •  •  • ,  n  are  canonical  affine  coordinates  of 
the  flat  affine  connection  V*  on  Sn. 

Remark  1\  Since  the  conformal  factor  A  in  (16)  can  be  alternatively  represented  by 

{exPq(Sq{p)))1~q  h'ln<?  (expg(^g(p)))  +g’ 
we  have  another  expression  of  if*,  i.e, 

^  hl9  (exp q(Sq(p))\ 

Thus,  the  potentials  and  dual  coordinates  given  in  the  proposition  recover  the  standard 
ones  [3,  4]  when  q  — >  1,  i.e, 

n+ 1 

if^-lnpn+1,  if*  ->•  ^Pilogpi  9l  ->•  log(pi/pn+i),  Pi  ^  Pi,  i  —  1,  ■  ■  •  ,n. 

i= 1 

Note  that  —  if*  coincides  with  the  entropy  studied  in  [25,  26,  27]  and  referred  to  as  the 
normalized  Tsallis  entropy.  The  conformal  (or  scaling)  factor  A  often  appears  in  the 
study  of  the  (/-analysis. 

Remark  2:  Similarly  to  the  above  conformal  transformation  of  (, Sn ,  g,  V(“-*),  we  can 
define  another  one  for  (Sn,  g,  with  a  conformal  factor 

A  (p)  := 

and  construct  another  dually  flat  structure  ( h!  =  X'g,  V',  V'*).  Hence,  the  following 
relations  among  them  hold  (See  Figure  1). 

{Sn,  h!,  V')  44 

1-conformally  equivalent  \ 

(« Sn,g,\7(o) )  44 
—  1-conformally  equivalent  f. 

(« sn,h,v *)  44 

Figure  1.  Relations  among  geometries 

Remark  3:  Because  of  the  projective  equivalence  (11),  a  submanifold  in  Sn  is  V^- 
autoparallel  if  and  only  if  it  is  V*-autoparallel.  In  particular,  the  set  of  distributions 
constrained  with  the  normalized  (/-expectations  (escort  averages)  [2]  is  a  simultaneously 
V(")-  and  V*-autoparallel  submanifold  in  Sn. 


(« Sn,h',V *) 

:j;  —1-conformally  equivalent 

(Sn,g,W^) 

1  1-conformally  equivalent 

(Sn,h,V) 
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4.  Applications  to  construction  of  alpha- Voronoi  diagrams  and 
alpha-centroids 

For  given  m  points  p1,  ■  ■  ■ ,  pm  on  Sn  we  define  a- Voronoi  regions  on  Sn  using  the  a- 
divergence  as  follows: 

Vor {a\Pk)  :=  fllP  e  Sn\D{a)(p,pk)  <  D{a\p,Pl)},  k  =  1,  •  ■  ■  ,m. 

l^k 

An  a-Voronoi  diagram  on  Sn  is  a  collection  of  the  a- Voronoi  regions  and  their 
boundaries.  Note  that  approaches  the  Kullback-Leibler  divergence  if  a  — >  —  1, 
and  D (0^  is  called  the  Hcllinger  distance.  If  we  use  the  Renyi  divergence  of  order  q^I 
[28]  defined  by 

1  n+ 1 

Da{p,r )  := - -ln^(pi)Q(A)1_a, 

a  ~  1  *=i 

instead  of  the  a- divergence,  Vor(l~2Q-)  (pk)  gives  the  corresponding  Voronoi  region 
because  of  their  one-to-one  functional  relationship. 

The  standard  algorithm  using  projection  of  a  polyhedron  [29,  6]  commonly  works 
well  to  construct  Voronoi  diagrams  for  the  Euclidean  distance  [6],  the  Knllback-Leibler 
[11]  and  Bregman  divergences  [12],  respectively.  The  algorithm  is  applicable  if  a  distance 
function  is  represented  by  the  remainder  of  the  first  order  Taylor  expansion  of  a  convex 
potential  function  in  a  suitable  coordinate  system.  Geometrically  speaking,  this  is 
satisfied  if  i)  the  divergence  is  a  canonical  one  for  a  certain  dually  flat  structure  and 
ii)  its  affine  coordinate  system  is  chosen  to  realize  the  corresponding  Voronoi  diagrams. 
In  this  coordinate  system  with  one  extra  complementary  coordinate  the  polyhedron  is 
expressed  as  the  upper  envelop  of  m  hyperplanes  tangent  to  the  potential  function. 

A  problem  for  the  case  of  the  a- Voronoi  diagram  is  that  the  cc-divergence  on 
Sn  cannot  be  represented  as  a  remainder  of  any  convex  potentials.  The  following 
theorem,  however,  claims  that  the  problem  is  resolved  by  conformally  transforming  the 
a-geometry  to  the  dually  flat  structure  (h,  V,  V*)  and  using  the  conformal  divergence 
p  and  escort  probabilities  as  a  coordinate  system. 

Here,  we  denote  the  point  on  £n  by  P  =  (Pi,  •  •  • ,  Pn )  because  Pn+i  =  1  —  Yff=\  Pi- 

Theorem  2  i)  The  bisector  of  pk  and  p{  defined  by  {p\D^a\p,pk)  =  D^^p.pf)}  is 
a  simultaneously  V^-  and  V* -autoparallel  hypersurface  on  Sn. 
ii)  Let  "Hfc,  k  =  1,  •  •  •  ,m  be  the  hyperplane  in  £n  x  R  which  is  respectively  tangent  at 
(Pk,'ip*(Pk))  to  the  hypersurface  {(P,y)\y  =  xf*(P)},  where  Pk  =  P{Pk )•  The 
a-Voronoi  diagram  can  be  constructed  on  £n  as  the  projection  of  the  upper  envelope 
of  PLk ’s  along  the  y-axis. 

Proof)  i)  Consider  the  V^'dgeodesic  connecting  pk  and  ph  and  let  p  be 

the  midpoint  on  satisfying  D^a\p1pk)  =  D^a\p_pfj,  Denote  by  B  the  V1^- 

autoparallcl  hypersurface  that  is  orthogonal  to  yV«)  anc}  contains  p.  Then,  for  all 
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Figure  2.  An  example  of  a-Voronoi  diagram  on  S2  (left)  for  a  =  0.6  (or  q  =  0.2)  and 
the  corresponding  one  on  £2  (right). 


Figure  3.  An  example  of  a-Voronoi  diagram  on  S 2  (left)  for  a  =  —2  (or  q  =  1.5)  and 
the  corresponding  one  on  £ 2  (right). 


r  e  B,  the  modified  Pythagorean  theorem  [20,  7]  implies  the  following  equality: 
D(a\r,pk )  =  D{a\r,p )  +  D{a\p,pk )  -  KD{a\r,p)D(a)(p,pk) 

=  D^\r,p)  +  D^\p,Pl)  -  KD^\r,p)D^\p,Pl)  =  D^(r,Pl). 

Hence,  B  is  a  bisector  of  pk  and  p{.  The  projective  equivalence  ensures  that  B  is  also 
V*-autoparallel. 

ii)  Recall  the  equality  Dly°l\p1r )  =  ZT_Q)(r,p)  and  the  conformal  relation  (13) 
between  D^~°^  and  p,  then  we  see  that  Vor ^a\pk)  =  Vor^011^  (pk)  holds  on  Sn,  where 

Vor(conf)(pfc)  :=  f|{p  6  Sn\p(pk,p)  <  p{pt,p)}. 
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Theorem  1,  relations  (14)  and  (15)  imply  that  p(pklp)  is  represented  with  the 
coordinates  ( T) )  by 

p(pt,p)  =  r(p)  -  (r(Pk) + 1  jI(p*)(p,(p)  -  p,(pk») , 

where  p  =  p(p)-  Note  that  a  point  ( P,yk(P ))  in  Ttk  is  expressed  by 

n  Bib* 

yk(P)  :=  P(Pk)  +  y  jf(Pk)(P,(p)  -  Pi(pt))- 

1=1  1 

Hence,  we  have  p(pk,p )  =  ^*( P )  —  yk{P)-  We  see,  for  example,  that  the  bisector  on 
£n  for  pk  and  pt  is  represented  as  a  projection  of  Pik  fl  Pi.  Thus,  the  statement  follows. 
Q.E.D. 

The  figure  2  and  3  show  examples  of  a-Voronoi  diagrams  on  the  simplex  of 
dimension  2.  In  these  cases,  the  bisectors  are  simultaneously  V^-  and  V*-geodesics. 

Remark  In  [30]  Voronoi  diagrams  for  broader  class  of  divergences  (contrast 
functions)  that  are  not  necessarily  associated  with  any  convex  potentials  are  studied  from 
more  general  affine  differential  geometric  points  of  views.  The  construction  algorithm  is 
also  given  there,  which  is  applicable  if  the  corresponding  affine  immersion  is  explicitly 
obtained. 

On  the  other  hand,  the  a-divergence  defined  not  only  on  Sn  but  on  the  positive 
orthant  R”+1  can  be  represented  as  a  remainder  of  the  potential  T  in  (7)  [3,  4,  8]. 
Hence,  the  a-geometry  on  R"+1  is  dually  flat.  Using  this  property,  a- Voronoi  diagrams 
on  R"+1  is  discussed  in  [31]. 

While  both  of  the  above  methods  require  computation  of  the  polyhedrons  in  the 
space  of  dimension  n  +  2,  the  new  one  proposed  in  this  paper  does  in  the  space  of 
dimension  n  +  1 .  Since  the  optimal  computational  time  of  polyhedrons  depends  on  the 
dimension  d  by  0(m  log  m  +  mLd/2-l)  [32] ,  the  new  one  where  d  —  n+  1  is  slightly  better 
when  n  is  even. 


The  next  proposition  is  a  simple  and  relevant  application  of  escort  probabilities. 
Define  the  a-centroid  c ^  for  given  m  points  p±,  ■  ■  ■  ,pm  on  Sn  by  the  minimize!'  of  the 
following  problem: 

m 

mm  53l)(a)(pfc,p). 


Proposition  1  The  a-centroid  for  given  m  points  p1,  ■  ■  ■ ,  prn  on  Sn  is  represented 
in  escort  probabilities  by  the  weighted  average  of  conformal  factors  A (pk)  =  1  /Zq{pk), 
i.e., 


^  m 

^ m  y  7-  \  E  Zv(Pk)Pi{Pk)i 

2^k=i  A q\Pk >  k= 1 


i  —  1,  •  •  • ,  n  +  1. 


Proof)  Let  9l  =  d'fp).  Using  (13),  (15)  and  the  relation  D^fp.r)  =  ZV  ")(r,p),  we 
have 

777.  777.  777.  77 

J2  D(a)iPk,P )  =  J2  Zq(Pk)p(P,Pk)  =  J2  ZqiPMiO)  +  ^*(v(Pk))  ~J2e\(Pk)}- 

k= 1  k= 1  k= 1  7—1 
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Then  the  optimality  condition  is 

q  m  m 

—  ]T  D{a\pk,p)  =  J2  Zq(pk)(r)i  -  Vi(Pk))  =  0,  i  =  1,  •  •  *,n, 

fe=i  fc=i 

where  rp  =  ryfp).  Thus,  the  statement  follows  from  Theorem  1  for  i  =  1,  For 

i  =  n  +  1  it  follows  from  the  fact  that  the  sum  of  the  weights  is  equal  to  one.  Q.E.D. 

5.  Concluding  remarks 

We  have  considered  ±l-conformal  transformations  of  the  a-geometry  and  obtained 
dually  flat  structure  (Sn,  h,  V,  V*).  Further  the  potential  functions  and  dually  flat 
coordinate  systems  associated  with  the  structure  have  been  derived.  We  see  that  the 
escort  probability  naturally  appears  to  play  an  important  role. 

From  a  viewpoint  of  contrast  functions,  the  geometric  structure  compatible  to  the 
Kullback-Leibler  divergence  is  (Sn,  g,  V1-1),  V^1^),  where  g  is  the  Fisher  information 
and  are  respectively  the  e-connection  and  the  m-connection.  Similarly,  the  a- 

divergence  (or  the  Tsallis  relative  entropy),  and  the  conformal  divergence  p  in  this  note 
correspond  to  ( Sn ,  g,  V^"-*)  and  («! Sn,  h,  V,  V*),  respectively.  They  are  summarized 
in  Figure  4. 

KL  divergence  a- divergence  conformal  divergence 

(Sn,g,V(1\V^)  < — >  {Sn,gMa\^{~a))  (5n,/i,V,V*),  {Sn,  h',  V',  V'*) 

dually  flat  constant  curvature  k  dually  flat 

Figure  4.  transformations  of  dualistic  structures 

The  physical  meaning  or  essence  underlying  these  transformations  would  be 
interesting  and  significant,  but  is  left  unclear.  (See  recent  publications  [33,  34]  for 
such  research  directions.) 

Finally,  we  have  shown  a  direct  application  of  the  conformal  flattening  to 
computation  of  ct-Voronoi  diagrams  and  ct-centroids.  Escort  probabilities  are  found 
to  work  as  a  suitable  coordinate  system  for  the  purpose. 
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Appendix  A:  Statistical  manifold  and  o-conformally  equivalence 

For  details  of  this  appendix  see  [17,  19,  20,  21,  22],  For  a  torsion-free  affine  connection 
V  and  a  pseudo  Riemannian  metric  g  on  a  manifold  A4,  the  triple  (A i,g,  V)  is  called  a 
statistical  manifold  if  it  admits  another  torsion-free  connection  V*  satisfying 

Xg(Y,  Z )  =  g(VxY,  Z )  +  g(Y,  V*XZ)  (A.l) 
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for  arbitrary  X,  Y  and  Z  in  A  (At),  where  X (At)  is  the  set  of  all  tangent  vector  helds  on 
Af.  It  is  known  that  (A 4,g,V)  is  a  statistical  manifold  if  and  only  if  Vg  is  symmetric, 
i.e.,  (Vxg)(y,  Z)  is  symmetric  with  respect  to  X,Y  and  Z.  We  call  V  and  V*  duals 
of  each  other  with  respect  to  g ,  and  (A4,g,V*)  is  said  the  dual  statistical  manifold  of 
(A4,  S',  V).  The  triple  of  a  Riemannian  metric  and  a  pair  of  dual  connections  ( g ,  V,  V*) 
satisfying  (A.l)  is  called  a  dualistic  structure  on  At. 

For  a  G  R,  statistical  manifolds  (A4,g,V)  and  ( Af,</,  V ')  are  said  to  be  a- 
conformally  equivalent  if  there  exists  a  positive  function  0  on  M.  such  that 

g\x,Y)  =MX,Y), 

1  -I-  ry 

gtyxY,  Z)  =  g(VxY,  Z)  -  —d( In  mz)g(X,  Y) 

+  l^{<f(ln^)(X)9(y,  Z)  +d(ln<P)(Y)g(X,Z)}- 

Statistical  manifolds  (A4,  g,  V)  and  (A4,g',  V')  are  a-conformally  equivalent  if  and  only 
if  (M,g,V*)  and  (JA,g,  V'*)  are  —  a-conformally  equivalent. 

A  statistical  manifold  (A4,g,V)  is  called  a-conformally  flat  if  it  is  locally  a- 
conformally  equivalent  to  a  flat  statistical  manifold.  Note  that  —  1-conformal  equivalence 
implies  projective  equivalence.  A  statistical  manifold  of  dimension  greater  than  three 
has  constant  curvature  if  and  only  if  it  is  ±l-conformally  flat. 

We  call  a  function  p  on  M.  x  IA  a  contrast  function  [35]  inducing  (g,  V,  V*)  if  it 
satisfies 

p(p.p)  =0.  peM, 

p[X\)  =  p[\Y]  —  0, 

g(X,Y)  =  -p[X\ Y], 

g(VxY,  Z)  =  -  p[XY\Z\,  g(Y.  VXZ)  =  -p{Y\XZ], 

where 

p[ Xi  •  •  •  Xk\Y,  ■  ■  ■  Yt]  (p)  :=  (X0p  •  •  •  (Xfc)p(yx),  •  •  •  (YMp,  q ) \p=q 

for  arbitrary  p,q  G  M  and  Xl}  Y}  e  X{M).  If  (M,g,  V)  and  ( M.,g',V ')  are  1- 
conformally  equivalent,  a  contrast  function  p'  inducing  (gr,  V',  V'*)  is  represented  by 
p  inducing  (g,  V,  V*),  as 

p'(p,q)  =  </>{q)p(p,q)- 

Appendix  B:  The  proof  for  the  fact  that  (Sn,h,  V*)  is  a  statistical  manifold 

We  show  that  V*/?.  is  symmetric.  By  the  definition  of  —  1-conformally  flatness  we  have 

(V\A)(T,  Z)  =  Xh(Y,  Z)  -  h(X*xY,  Z)  -  h(Y,  X*xY) 

=  d\(X)g(Y,Z)  +  \Xg(Y,Z) 

-  X {<7(V£V,  Z)  +  d(\n\)(Y)g(X,  Z)  +  d(\n  \)(X)g(Y,  Z)} 

-  \{g{YM£]Z)  +  d(\n\){Z)g{X,Y)  +  d{\n\){X)g{Z,Y)}. 


(A.2) 

(A.3) 

(A.4) 

(A.5) 
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Substitute  the  equality  Xd(\n  A)  =  dX  into  the  right-hand  side,  then  it  is  transformed  to 

A {Xg(Y,  Z )  -  g(X^Y,  Z)  -  g(Y,  V^Z) 

-  d(ln  A )(X)g(Y,  Z)  -  d{ In  A ){Y)g{X,  Z)  -  d{ In  A )(Z)g(X,  Y)} 

=  X (V^g)(Y,Z)  -  X{d{lnX)(X)g(Y,Z)  +  d{lnX)(Y)g(X,Z)+d(lnX)(Z)g(X,Y)}. 

Thus,  V*/i  is  symmetric  because  («Sn,  <7,  V^)  is  a  statistical  manifold,  i.e.,  X^g  is 
symmetric.  Since  is  torsion-free,  so  is  V*  by  the  definition  of  -1-conformally 
flatness. 
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Geometry  for  ^-exponential  families  is  studied  in  this  paper.  A  ^-exponential 
family  is  a  set  of  probability  distributions,  which  is  a  natural  generalization 
of  the  standard  exponential  family.  A  ^-exponential  family  has  information 
geometric  structure  and  a  dually  flat  structure.  To  describe  these  relations, 
generalized  conformal  structures  for  statistical  manifolds  are  studied  in  this 
paper.  As  an  application  of  geometry  for  ^-exponential  families,  a  geometric 
generalization  of  statistical  inference  is  also  studied. 

Keywords :  ^-exponential  family,  ^-product,  Information  geometry,  Tsallis 
statistics,  Statistical  manifold,  Divergence. 


Introduction 

An  exponential  family  is  a  set  of  probability  distributions  such  as  a  set  of 
normal  distributions,  of  Poisson  distributions,  or  of  gamma  distributions, 
etc.  Such  probability  distributions  decay  exponentially.  However,  in  com¬ 
plex  systems,  probability  distributions  often  have  long  tails,  that  is,  prob¬ 
ability  distributions  do  not  decay  exponentially.  The  (/-normal  distribution 
which  is  frequently  discussed  in  Tsallis  nonextensive  statistical  mechanics 
[18]  is  a  typical  example  of  such  probability  distributions. 

In  this  paper,  we  consider  (/-exponential  families.  A  (/-exponential  family 
is  a  natural  generalization  of  the  standard  exponential  family,  and  which  in¬ 
cludes  the  set  of  (/-normal  distributions.  From  the  viewpoint  of  information 
geometry,  it  is  known  that  an  exponential  family  has  a  dually  flat  structure 
(see  [1]).  We  will  see  that  (/-exponential  families  naturally  have  dually  flat 
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structures. 

A  g-exponential  family  also  has  information  geometric  structure,  that 
is,  a  g-exponential  family  has  the  Fisher  metric  and  a-connections.  Hence  a 
g-exponential  family  has  two  kinds  of  statistical  manifold  structures.  Thus, 
we  consider  relations  of  these  structures  using  generalized  conformal  equiv¬ 
alence  relations  on  statistical  manifolds. 

In  the  later  part  of  this  paper,  we  consider  statistical  inferences  for 
g-exponential  families.  Generalizations  of  independence  or  likelihood  func¬ 
tions  have  been  introduced  in  machine  learning  theory  [4]  or  in  Tsallis 
statistics  [16].  We  show  that  dually  flat  structures  on  g-exponential  fami¬ 
lies  work  naturally  for  such  generalized  statistical  inferences. 

1.  Preliminaries 

In  this  section,  we  review  geometry  of  statistical  models  and  related  ge¬ 
ometry  (cf. [1,  15]).  We  assume  that  all  objects  are  smooth  throughout  this 
paper.  We  also  assume  that  the  manifold  is  simply  connected  since  we  will 
discuss  geometry  of  statistical  models. 


1.1.  Statistical  models 


Let  A  be  a  total  sample  space  and  let  H  be  an  open  domain  of  Rn .  We 
say  that  S'  is  a  statistical  model  or  a  parametric  model  on  X  if  S  is  a  set  of 
probability  densities  with  parameter  £  6  H  such  that 


S={p(a;;£)  [  p(x;£)dx  =  l,p(x;0  >  0,£  e  H  C  Rn  }  . 


1  I  Jx  ) 

Under  suitable  conditions,  S  can  be  regarded  as  a  manifold  with  a  local 
coordinate  system  {£\  ...,£”}  (see  [1]). 

For  a  statistical  model  S,  we  define  a  function  gfj{£)  :  H  — >  R  by  the 
following  formula: 


'=  jx  (^7logMuO) 


logp(a:;£) 


p(x\ £ )dx 


=  WWd- 


Here,  for  simplicity,  we  used  following  notations: 

E^[f]  =  /  f(x)p(x\£)dx,  (the  expectation  of  f(x)  at  p(x;  £)), 

Jx 

1%  =  l(x;  £)  =  logp(x;  £),  (the  log  likelihood  of  p(x;  £)), 


di  = 


d 

dtj’ 
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We  assume  that  gfj(£)  is  finite  for  all  i,j Set  a  matrix  gF  =  (gfj),  then 
we  can  check  that  gF  is  symmetric  and  non-negative  definite.  We  assume 
that  gF  is  positive  definite.  Then  gF  is  a  Riemannian  metric  on  S.  We  call 
gF  the  Fisher  metric  on  S. 

For  a  £  R  we  define  the  a-connection  by  the  following  formulas: 

ltl(0  =  E,  [(^  +  (dklt)  , 

Mv^-,4)  =  rg. 

We  can  check  that  is  torsion-free  and  V -°i  is  the  Levi-Civita  connec¬ 
tion  of  the  Fisher  metric.  It  is  known  that  ±  1-connections  are  more  im¬ 
portant  than  the  Levi-Civita  connection  in  geometric  theory  of  statistical 
inferences.  We  call  the  exponential  connection  and  the  mixture 

connection. 

For  a-connections,  the  following  formula  holds 

XgF(Y,  Z)  =  gF(V^Y,  Z)  +  gF (Y,  a)Z). 

The  connections  and  V(~a)  are  said  to  be  dual  (or  conjugate)  with  re¬ 
spect  to  gF .  For  arbitrary  a,/3  £  R,  the  difference  between  the  a-connection 
and  the  /3-connection  is  given  by 

r(/3)  p(a)  a~  PrF 
1  ij,k  1  ij,k  '  2  ^ ijk ? 

where 

cfjk(0  =  i-yc)ikdjhd,k. 

The  (0, 3)-tensor  field  CF  determined  by  CFk  is  called  a  cubic  form.  The 
covariant  derivative  of  the  Fisher  metric  gF  satisfies  gF)(Y,  Z)  = 
aCF(X,  Y,  Z). 

We  say  that  a  statistical  model  S  is  an  exponential  family  if 
S=jp(a;;6>)  p(x;  9)  =  exp  Z{x)  +  ^ e'Fflx)  -  if{6)  ,9e<dcRn 

l  L  i=i 

where  0  is  a  parameter  space,  Z,Fi ,  •••  ,  Fn  are  random  variables  on  X 
and  if)  is  a  function  on  0.  The  coordinate  system  {01}  is  called  the  natural 
parameters. 

Proposition  1.1.  For  an  exponential  family  S,  the  natural  parameters 
{61}  is  an  affine  coordinate  system  with  respect  to  V^1),  that  is,  T^k  = 
0  (i,j,k  =  l,...,n),  and  the  1- connection  is  flat. 
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For  simplicity,  we  set  Z  =  0.  It  is  possible  to  assume  this  condition 
without  loss  of  generality.  We  say  that  M  is  a  curved  exponential  family  of 
S  if  M  is  a  submanifold  of  S  such  that 


M  =  {p(x;  0(u))  | p(x;  9(u))  €  S,  u  €  U  C  Rm  }  . 


Example  1.1  (normal  distributions).  Let  S  be  the  set  of  normal  dis¬ 
tributions, 


S  =  \p(x;p,,a) 


p(x;p,cr)  =  J_  exp 

V27T(J 


(x-  h)2 

2cr2 


Here,  the  sample  space  X  is  R,  and  the  parameter  space  is  the  upper  half 
plane  S  ■  {(p,  cr) } |  —  oo  <  r  <  oo,  0  <  a  <  oo}. 

The  Fisher  metric  in  (p,  a) -coordinate  is  given  by 


(4)  =  -o 


1  /10 


0  2 


Hence  S  is  a  space  of  constant  negative  curvature  —1/2. 
Let  us  change  parameters  as  follows: 


Set 


nl  _  T  n2  _  1 


Z[x)  =  0,  F1(x)  =  x,  F2( x)  =  x2, 
W)  =  ^+log(V2na)  = 


log  ( 


-£)■ 


then  we  obtain 

P(x-,p,a)  = 


1 


\[Tna 
—  exp 


exp 


(x  —  u)2 
2  <r2 


-  h  -  log('/s<’) 


=  exp  \x9l  +  x292  —  if{9)]  ■ 

This  implies  that  the  set  of  normal  distributions  is  an  exponential  family. 

For  an  exponential  family,  the  Fisher  metric  and  the  cubic  form  in  {91}- 
coordinate  are  given  by 

9ij(8)  =  didjif(9),  (1) 

(2) 


11J 


Cfjk(9 )  =  didjduW). 

The  expectation  parameters  {ip}  are  given  by  rji  =  E[Fi(x)\,  and  {ry.j}  is  a 
V^-1)-afhne  coordinate  system. 
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1.2.  Statistical  manifolds 

Let  (M,  ft.)  be  a  semi-Riemannian  manifold,  and  let  V  be  a  torsion-free  affine 
connection  on  M.  We  sat  that  the  triplet  (M,  V,  ft)  is  a  statistical  manifold 
if  Vft  is  a  totally  symmetric  (0,  3)-tensor  field.  Obviously,  a  statistical  model 
has  many  statistical  manifold  structures. 

For  a  statistical  manifold  (M,  V,  ft),  we  define  the  dual  connection  V* 
with  respect  to  ft  by 

Xh(Y,  Z)  =  h{VxY,  Z)  +  h(Y,  VXZ). 

The  connection  V*  is  torsion-free  and  V*ft  is  also  symmetric.  Hence  the 
triplet  (M,  V*,  ft)  is  a  statistical  manifold.  We  call  (M,  V*,  ft)  the  dual  sta¬ 
tistical  manifold  of  (M,  V,  ft.). 

Proposition  1.2.  Let  ( M,  ft )  be  a  semi-Riemannian  manifold  and  let  C 
be  a  totally  symmetric  (0,3) -tensor  field.  Denote  by  the  Levi-Civita 
connection  V(0)  with  respect  to  ft.  We  define  an  affine  connection  by 

h(V{^Y,Z)  :=  h(V^xY,  Z)  -  *C{X,Y,Z). 

Then,  the  connections  V and  are  torsion-free  affine  connections 

mutually  dual  with  respect  to  ft,  and  the  covariant  derivative  V^ft.  zs  totally 
symmetric.  Hence  (M,X7(-a' ,h)  and  (M,  V^_a-*,ft)  are  statistical  manifolds. 


The  connection  V  is  flat  if  and  only  if  V*  is  flat.  In  this  case,  we  say 
that  (M,  ft,  V,  V*)  is  a  dually  flat  space.  Since  the  connection  V  is  flat, 
there  exists  an  affine  coordinate  system  { 9 *}  on  M.  In  addition,  there  exits 
a  V*-affine  coordinate  system  {?^}  such  that 

We  say  that  {77*}  is  the  dual  coordinate  system  of  {01}  with  respect  to  ft. 


Proposition  1.3.  Let  (M,  ft,  V,V*)  be  a  dually  flat  space.  Suppose  that 
{61}  is  a  V -affine  coordinate  system,  and  {77,;}  is  the  dual  coordinate  system 
of  {91}.  Then  there  exist  functions  if  and  <fi  on  M  such  that 


dif  dcj) 

89l  dry. 


9\  ip(p)  +  <f>{jp)  -  ^2  0l(p)Vi(p)  =  0. 

i=l 


In  addition,  the  following  formulas  hold: 


hij 


d2fi>  ij  =  d2<t> 
89ld9i !  drjidpj  ’ 


(3) 


(4) 
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where  ( hij )  is  the  component  matrix  of  a  semi-Riemannian  metric  h  with 
respect  to  {01},  and  (hlJ)  is  the  inverse  matrix  of  (hij). 

The  functions  i/)  and  4>  are  called  the  9-potential  and  the  ij-potential,  re¬ 
spectively.  The  relation  (3)  is  called  the  Legendre  transformation.  From 
Equation  (4),  the  senri-Riemannian  metric  h  is  a  Hessian  metric.  Hence  we 
also  say  that  (M,  V,  h)  is  a  Hessian  manifold  [15]. 

Definition  1.1.  We  say  that  a  function  p  on  M  x  M  is  the  (canonical) 
divergence  on  ( M ,  h,  V.  V*)  if 

n 

p(p\\q)  ■=  4>(p)  +  <i>(q) -^2ol(p)vi(q),  (p,qeM).  (5) 
»= 1 

We  remark  that  the  definition  of  p  is  independent  of  the  choice  of  affine 
coordinate  system  on  M. 

1.3.  Generalized  conformal  relations  on  statistical 
manifolds 

We  give  a  brief  summary  of  generalized  conformal  relations  on  statistical 
manifolds.  Generalized  conformal  structures  on  statistical  manifolds  have 
been  studied  in  affine  differential  geometry  (see  [5,  6,  7,  8]). 

Definition  1.2.  Suppose  (M,  V,  h)  and  (M .  V,  h)  are  statistical  manifolds. 
We  say  that  (M,  V,  h)  and  (M,  V,  h )  are  conformally-projectively  equivalent 
if  there  exist  two  functions  k  and  A  such  that 

h(X,Y)  =  eK+xh(X,Y), 

VXY  =  VA -Y  -  h(X,  T)grad^A  +  dn(Y)  X  +  dn(X)  Y, 

where  grad^A  is  the  gradient  vector  field  of  A  with  respect  to  h. 

In  particular,  for  a  constant  a  G  R,  we  say  that  two  statistical  manifolds 
are  a-conformally  equivalent  if  there  exists  a  function  A  on  M  such  that 

h(X,Y)  =  exh(X,Y), 

Va :Y  =  XxY  -  h(X ,  Y) grad^A  +  {d\(Y)  X  +  d\(X)  Y}  . 

A  statistical  manifold  (M,  V,/i)  is  called  a-conformally  flat  if  (M,  V,/i)  is 
locally  a-conformally  equivalent  to  some  flat  statistical  manifold. 

We  remark  that  the  conformal-projective  equivalence  relation  or  the 
a-conformal  equivalence  relation  are  natural  generalizations  of  conformal 
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equivalence  relation  for  Riemannian  manifolds.  In  fact,  suppose  that  (M,  g) 
and  (M,  g )  are  Riemannian  manifolds,  and  and  V(0)  denote  their  Levi- 

Civita  connections.  If  g  and  g  are  conformally  equivalent,  then  the  following 
formulas  fold. 

g(X,Y)  =  e2Xg(X,Y), 

-  h(X,  Y) grad^A  +  d\(Y)  X  +  d\(X)  Y. 

This  implies  that  ( M ,  V(0) ,  g)  and  ( M ,  V1 2-0^ ,  g)  are  O-conformally  equivalent. 

To  describe  generalized  conformal  structures,  let  us  introduce  contrast 
functions.  Let  p  be  a  function  on  M  x  M.  We  define  a  function  on  M  by 


p[Xi  •  •  •  W  •  •  •  Y^p)  =  {X{)p  •  •  •  (XtMY^  ■  ■  ■  (Yj)qp(p\\q)\p=q, 

where  Xi,  •  ■  ■  Xf,  Y\  •  •  •  Yj  are  arbitrary  vector  fields  on  M.  We  call  p  a 
contrast  function  on  M  if 

p(p\\p)  =  0  ( p&M ), 

p[X\\  =  p[\X]  =  0, 

h(X,  Y)  :=  —  p[X|T]  is  a  semi-Riemannian  metric  on  M. 

We  remark  that  the  canonical  divergence  on  a  dually  flat  space  is  a  typical 
example  of  contrast  function. 

For  a  given  contrast  function  p  on  M,  we  can  define  a  torsion-free  affine 
connection  by  the  following  formula: 

h(XxY,Z)  :=  —p[XY\Z). 

The  triplet  (M,  V,  h)  is  a  statistical  manifold.  We  say  that  is 

induced  from  the  contrast  function  p.  If  we  exchange  the  arguments  as 
p*(p ||g)  :=  p(q\\p),  then  p*  is  also  a  contrast  function  and  induces  the 
dual  statistical  manifold  (M,  V*,/i).  For  geometry  of  contrast  functions, 
the  following  results  are  known  ([7,  8]). 

Proposition  1.4.  Let  p  and  p  be  contrast  functions  on  M,  and  let  X  be  a 
function  on  M.  Suppose  that  (M,  V,  h)  and  ( M,\7,h )  are  statistical  mani¬ 
folds  induced  from  p  and  p,  respectively. 

(1)  If  p(p\\q)  =  e.X^p{p\\q),  then  two  statistical  manifolds  ( M,V,h )  and 
( M,\/,h )  are  (— 1) -conformally  equivalent. 

(2)  If  p{p\\q)  =  eX^p(p\\q),  then  two  statistical  manifolds  (M,X,h)  and 
(. M,V,h )  are  1-conformally  equivalent. 
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2.  Geometry  for  q-exponential  families 

In  this  section,  we  discuss  geometry  of  q-exponential  families.  A  q- 
exponential  family  is  a  generalization  of  the  standard  exponential  family. 
We  will  consider  conformal  relations  between  the  standard  information  ge¬ 
ometry  and  the  q-Fisher  geometry. 


2.1.  The  q-escort  probability  and  the  q-expectation 

To  begin  with,  we  review  the  notion  of  the  escort  probability  and  the  q- 
expectation.  Suppose  that  p(x)  is  a  probability  distribution  on  X .  For  a 
fixed  number  q,  we  define  the  q-escort  distribution  Pq(x)  of  p(x)  by 

Pq(x)  :=  -^-rp{x)q,  nq(p)  :=  [  p{x)qdx. 
ttqKP)  JX 

Let  f(x)  be  a  random  variable  on  X.  The  q-expectation  of  /( x)  is  the 
expectation  with  respect  to  the  q-escort  distribution,  that  is, 

Eq,p[f(X)\  :=  J^f(x)Pq(x)dx  =  f{x)p(x)qdx. 

If  the  sample  space  X  is  discrete,  the  q-escort  distribution  or  the  q- 
expectation  can  be  defined  by  replacing  the  integral  f  ■  ■  ■  dx  with  the  sum 
YxGX' 


2.2.  The  q-exponential  family 


Next,  we  define  the  q-exponential  and  the  q-logarithm.  Suppose  that  q  is  a 
fixed  positive  number.  Then  the  q-exponential  function  is  defined  by 

exp  a;  :=  (  ^  +  ^  ~  ^  ^  (!  +  U  -  >  °),  (6) 

q  I  exp  x,  q  =  1, 


and  the  q-logarithm  function  by 


log9a; 


x\lq  1 ;  9^!,  (x>  0), 
log  a;,  q  =  1. 


If  we  consider  the  limit  q  — >  1,  the  q-exponential  and  the  q-logarithm  re¬ 
cover  the  standard  exponential  and  the  standard  logarithm,  respectively. 
For  simplicity,  we  assume  that  the  variable  x  in  (6)  satisfy  the  condition 
1  +  (1  —  q)x  >  0  if  we  consider  q-exponential  function.  Hence  q-exponential 
and  q-logarithm  function  are  always  mutually  inverse  functions. 
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Definition  2.1.  A  statistical  model  Sq  =  {p(x,&)  \  9  €  0  C  Rn}  is  called 
a  q- exponential  family  if 


Sq:= 


p{x;  0)  =  exp9 


.*= l 


where  fi(x), . . . ,  Fn(x)  are  random  variables  on  the  sample  space  X,  and 
ip(9)  is  a  function  on  the  parameter  space  0. 


The  information  geometric  structure  of  the  g-exponential  family  is  closely 
related  to  the  (1  —  2 q)-  and  the  (2 q  —  l)-connections.  Hence  we  fix  the 
relations  of  two  parameters  q  and  a  as  1  —  2g  =  a. 


Example  2.1  (g-normal  distributions).  A  q-normal  distribution  is  the 
probability  distribution  defined  by  the  following  formula: 


p(x;p,  a) 


1  —  q  (x  —  fi)2  x~q 
3  ~q  cr2  , 


where  [*]+  =  max{0,*},  {/i,er}  are  parameters  — oo  <  p  <  oo,0  <  a  <  oo, 
and  Zq^  is  the  normalization  defined  by 


z  _  f  VT=q  Beta  (l-q’  2)  a'  (  00  <  g  <  1), 

\  \2(q-l)  >  h)  —  1  <  3)- 


Set 


then 

log  qPq(x) 


-  4, 

3  —  q  y  az 

o2  =  -3^zr.‘^. 

m  =  z"~' 


AO2 


1  ~q 


1  /  1  (,  _  l~g(x-/i)2\  _ 

1  -  g  l  Zlfiq  V  3  —  g  cr2  ) 

2  RZ&  Z*-1  2  Zg"1  p2  Zg"1-! 

(3  —  gV2"*  (3  — gV2**  3  —  g  cr2  1  — g 

fi*1®  +  6»V  -  ^((9). 


This  implies  that  the  set  of  q-normal  distributions  is  a  q- exponential  family. 
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We  remark  that  g-normal  distributions  include  several  important  proba¬ 
bility  distributions.  If  q  =  1,  then  the  g-normal  distribution  is  the  normal 
distribution,  of  course.  If  q  =  2,  then  the  distribution  is  the  Cauchy  distri¬ 
bution.  If  q  =  l  +  l/(n+l),  then  the  distribution  is  Student’s  t-distribution. 
We  also  remark  that  mathematical  properties  of  g-normal  distributions  have 
been  obtained  by  several  authors.  See  [16,  17],  for  example. 


Example  2.2  (discrete  distributions).  Suppose  that  the  sample  space 
X  is  a  finite  discrete  set.  Then  the  set  of  all  probability  distributions  on  X 
is  given  by 


n+1  n+1 

Vi  >  0,  ^2  Vi  =  1>  v{x\  v)  =  ^2  vA{x) 
2=1  2  =  1 


where  5i(x)  equals  one  if  x  =  i  and  zero  otherwise.  Set 

0*  =  {(Vi)1~q  -  (Vn+l)1~q}  , 

1  —  q 

i>(0)  =  -iog9p„+i> 

then  we  obtain 

log qPq{x)  =  YZTq  “  !} 

=  +  |^)  -  *} 

=  I  XI  (PiN*  -  {Vn+lN'‘)  iitA  +  (Vn+ 1)1-9  -  l| 

n 

=  J^0*6i(x)  -  ip(0). 

i- 1 

This  implies  that  the  set  of  discrete  distributions  is  a  q-exponential  family. 
We  note  that  this  also  holds  in  the  case  q  =  1,  that  is,  the  set  of  discrete 
distribution  is  an  exponential  family. 


2.3.  Geometry  for  q-exponential  families 

For  a  ^-exponential  family  Sq  =  {p(x;9)},  we  assume  that  the  potential 
function  if  is  strictly  convex.  We  define  the  q-Fisher  metric  and  the  q-cubic 
form  in  the  same  manner  as  exponential  families  (1)  and  (2): 

9ij(0)  =  didjif{e), 

Cqjk(0)  =  did&W)- 
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Since  gq  is  a  Hessian  metric  on  {S^},  we  can  define  a  flat  affine  connection 

Vg(e)  _  yg(l)  by 

9q^ie)Y,  Z)  =  gq(X/f' ]Y,  Z)  -  \cq{X,  Y,  Z), 

where  V9^  is  the  Levi-Civita  connection  with  respect  to  the  q-Fisher  met¬ 
ric  gq.  In  this  case,  the  parameters  {91}  is  a  V^^-affine  coordinate  system. 
We  denote  by  V9^1™)  the  dual  connection  of  V9^®^  with  respect  to  gq.  We 
call  V9(el  the  q-exponential  connection  and  V9(m)  the  q-mixture  connection. 

Since  V9(e)  is  flat,  then  X7q(m'>  is  also  flat.  Hence  we  immediately  obtain 
the  following  proposition. 

Proposition  2.1.  Let  Sq  be  a  q-exponential  family.  Then  the  tetrad 
(Sq,gq,  V9(e\  Vq(m))  is  a  dually  flat  space. 

Let  Sq  be  a  ^-exponential  family.  From  a  direct  calculation,  we  have 

dip(x\  8)  =  p(x-,9)q(Fi(x)  -  diijj(8)), 

where  <9,;  =  d/dOl.  Since  fx  dip(x ,  9)dx  =  di  fx  p(x,  9)dx  =  0,  we  obtain 

diip(8)  =  — ^ [  Fi(x)p(x;9)qdx  =  [  Fflx)Pq(x)dx. 

Uq(p)  Jx  Jx 

This  implies  that  the  q-mixture  parameters  are  given  by  the  q-expectation 
of  the  random  variables  {Fi}.  Hence  we  conclude 

Proposition  2.2.  Let  Sq  be  a  q-exponential  family.  Then  the  q-mixture  pa¬ 
rameters  {rji}  are  given  by  the  q-expectation  of  the  random  variables  Fflx), 
that  is, 

f)  r 

Vi  =  =  jx  Fi(x)Pq(x;  9)dx. 

Next,  we  consider  relations  between  the  standard  Fisher  structure  and 
the  q-Fisher  structure  from  the  viewpoint  of  contrast  functions. 

For  a  q-exponential  distribution  Sq,  we  denote  by  pq  the  canonical  di¬ 
vergence  (5). 

Proposition  2.3.  Let  Sq  be  a  q-exponential  family.  Then  the  canonical 
divergence  pq  on  Sq  is  given  by 

PqW)\\p{9))  =  EqA8)[\ogqp(9)  -  log qp{9')\. 
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Proof.  Since  (Sq,  gq,  Vq(-e\Vql-m^)  is  a  dually  flat  space,  the  g-Fisher  met¬ 
ric  has  a  potential  function  ip.  We  denote  cp  by  the  dual  potential  function 
of  ip.  For  probability  distributions  p(9)  and  p(9')  in  Sq,  using  the  Legendre 
duality  (3),  we  obtain 

Eq,P(s)[logqp{9)  -  log qp{9’)\ 

n  n  \ 

53  PFi(x)  -  *K0)  -  Y,(d'YFi^  +  M0, )  9^dx 

.i=l  i= 1  ) 

n 

=  53  9\  -  rn  -  ^(e'Yviix)  +  tp(9') 

i= 1  i=  1 

n 

=^9')+m-Y.^'Yni 

2=1 

=  PqW)\\P(9)).  D 

We  remark  that  the  canonical  divergence  pq{p{9)\\p(9'))  induces  the  sta¬ 
tistical  manifold  (Sq,Vq^e\gq)  and  the  dual  divergence  p*(p(9)\\p(9' ))  := 
pq(p(6')\\p(9))  induces  (Sq,  \7q(m\  gq).  The  g-exponential  family  also  has 
another  divergence,  called  the  divergence  of  Csiszar  type  p q  ,  which  is  de¬ 
fined  by 

Pq{v{0)\W))  ■=  Y=r[i  |l  —  J^p{9)qp(9')1~qdx^  . 

This  is  essentially  equivalent  to  the  q  times  of  the  (1  —  2g)-divergence  in 
information  geometry.  The  divergence  (1  /q)pq  induces  the  statistical  man¬ 
ifold  (SqM1-2g),9F)- 

Proposition  2.4.  Suppose  that  pq  and  p q  are  the  canonical  divergence 
and  the  divergence  of  Csiszar  type  on  a  q-exponential  family,  respectively. 
Denote  by  Q,q(p(0))  the  normalization  for  the  q-escort  distribution  ofp(9). 
Then  pq  and  p q  satisfy 

pM9')\\p{9))  =  ^J-^pC(p(e)\\p{9')). 

Proof.  From  Proposition  2.3  we  obtain 

PqW)\\p{9))  =  EqHe)[\ogqp(9)  -  log qp{9')\ 

p{9)l~q-  1  p(d')1-'3  -  1\  P(9)q 

~i~q  r^T~ )  nMW 


GEOMETRY  FOR  q-EXPONENTIAL  FAMILIES 


67 


_  1-  fxp(0)*p(0')1-9dx 
(1  -  q)Clq(p(9)) 

Theorem  2.1.  For  a  q-exponential  family  {S^},  statistical  manifolds 
(Sq,Vq^,gq)  and  (, SqM2q~  1\gF)  are  1-conformally  equivalent. 

Proof.  Recall  that  pq(p(9)\\p(9'))  induces  (Sq,  X7q(-e\  gq).  From  duality 
of  contrast  function,  (l/q)pq*  (p(9)\\p(9'))  =  (l/q)pq  (p(9')\\p(9))  induces 
(Sq,  yl2?-1),  gF).  From  Proposition  2.4,  we  have 

This  implies  that  two  statistical  manifolds  are  1-conformally  equivalent 
from  Proposition  1.4.  □ 

We  remark  that  this  theorem  was  already  obtained  in  the  case  that  the 
sample  space  X  is  discrete  ([13,  14]).  For  the  dual  statistical  manifolds,  we 
obtain  the  following  corollary  immediately. 

Corollary  2.1.  For  a  q-exponential  family  {^q } ,  two  statistical  manifolds 
(Sq,S7q(jn\gq)  and  (Sq,  gF)  are  (—1)- conformally  equivalent. 

Since  (Sq,  gq,  V9^,  X7q(ml)  is  dually  flat,  we  also  obtain  the  following 
corollary. 

Corollary  2.2.  For  a  q-exponential  family  {£g},  the  statistical  mani¬ 
fold  (Sq,  V^29-1),  gF)  is  1-conformally  flat,  and  (Sq,  S7^l~2q\  gF)  is  (—  1)- 
conformally  flat. 

For  generalization  of  exponential  families,  several  results  have  been  ob¬ 
tained  in  more  generalized  frameworks  (see  [4,  10,  11,  12]).  If  we  consider 
relations  between  the  standard  Fisher  geometry  and  dually  flat  structures 
for  them  as  in  our  paper,  some  suitable  assumptions  may  be  required. 


3.  An  application  to  statistical  inferences 

In  this  section,  we  discuss  an  application  of  geometry  of  ^-exponential  fam¬ 
ilies  to  statistical  inferences  along  the  author’s  explanatory  report  [9]. 
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3.1.  Generalization  of  independence 

At  first,  let  us  recall  the  independence  of  random  variables.  Suppose  that  X 
and  Y  are  random  variables  which  belong  to  probability  density  functions 
Pi  (x)  and  P2  (y) ,  respectively.  We  say  that  X  and  Y  are  independent  if  the 
joint  probability  density  function  p(x,  y)  is  defined  by  the  product  of  the 
marginal  probability  density  functions,  that  is, 

p(x,y)  =  pi(x)p2(y). 

We  assume  that  pi(x)  and  p2 (y)  are  positive  everywhere  on  the  sample 
space.  Then  the  above  equation  can  be  written  as  follows: 

p{x,y)  =  Pi{x)p2{y)  =  exp  [log  pi  (a:)  +  logp2(x)] . 


This  implies  that  the  notion  of  independence  depends  on  the  duality  of  the 
exponential  function  and  the  logarithm  function,  or  the  law  of  exponents. 
Hence  we  can  generalize  the  notion  of  independence  from  the  viewpoint  of 
^-exponential  functions. 

For  a  fixed  positive  number  g,  we  assume  that  x  >  0,  y  >  0  and  a:1-9  + 
y1~q  —  1  >  0.  The  q-product  [2]  of  x  and  y  is  defined  by 

x  y  :=  [x1^9  +  yl~q  -  l]  1~“  . 

The  following  properties  follow  from  the  definition  of  g-product. 


exp9  x  expg  y  =  exp?  (x  +  y), 
log9(a:  y)  =  logq  x  +  log?  y. 


Let  us  define  the  notion  of  ^-independence.  We  say  that  X  and  Y  are 
q-independent  with  m-normalization  (mixture  normalization)  if  the  joint 
probability  density  function  pq(x,y)  is  defined  by  the  g-product  of  the 
marginal  probability  density  functions,  that  is, 


Pq{x,y ) 


Px{x)  ®qp2(y) 


JPl,P2 


where  ZPl,P2  is  the  normalization  defined  by 


JPl,P2 


Pi{x)  p2(y)dxdy. 


xy 


Since  the  g-product  of  probability  density  functions  pi(x)  <S>qp2{y)  is  not  a 
probability  density  in  general,  a  suitable  normalization  is  required  [4]. 
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3.2.  Geometry  for  q-likelihood  estimators 

Let  S  =  {p(a:;£)|£  £  S}  be  a  statistical  model,  and  let  {x\, . . .  ,xn}  be 
IV-independent  observations  generated  from  a  probability  density  function 
p(x;£)  £  S.  We  define  the  q-likelihood  function  [16]  Lq(£)  by 

Lq(£)  =  p(x i;  0  p{x2 ;  0  p(xN; £). 

In  the  case  q  — >  1,  the  (/-likelihood  function  Lq  is  the  standard  likelihood 
function  on  S.  Though  Lq  may  not  be  a  probability  density  on  H,  we  regard 
Lq  as  a  generalization  of  the  likelihood  function. 

Since  (/-logarithm  functions  are  strictly  increasing,  it  is  equivalent  to 
consider  the  (/-logarithm  (/-likelihood  function  [3] 

N 

log  qLq{Q  =51  I0®  qP^i'O- 

i—1 

We  say  that  £  is  the  maximum  q-likelihood  estimator  if 

i  =  arg  max  Lq(f)  =  arg  max  log  Lq(£) 

Now  let  us  consider  (/-likelihood  estimator  for  (/-exponential  families.  Let 
Sq  be  a  g-exponential  family  and  let  M  be  a  curved  (/-exponential  family 
in  S.  Suppose  that  {aq, . . . ,  Xn}  are  ^independent  observations  generated 
from  p{x\  u )  =  p(x ;  6{u))  £  M . 

Then  the  g-likelihood  function  is  calculated  as 

JV  N  (  n 

log qLq{u)  =  5Zlog qP(Xj',V’)  =  5^  “  Hd(u)) 

j= i  j= i  U=i 

n  N 

=  E^(u)E^(*^-^(0(«))- 

*=1  j= i 

The  g-logarithm  g-likelilrood  equation  is 

N 

di  logg  Lq(u)  =  55  Fi(xj)  -  Ndii/j(e(u))  =  0. 
i= i 

Thus,  the  g-likelihood  estimator  for  S  is  given  by 

1  N 

Vi  =  ^E^(^)- 

j= i 
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On  the  other  hand,  the  canonical  divergence  can  be  calculated  as 

P*q(p(v)M°(u)))  =  Pq(p(9{u))\\p{v)) 

n 

=  HS(U))  +  ~  Y  0l(u)Vi 

i= 1 

=  -  ^log qLq(u). 

Hence  the  q-likelihood  is  maximum  if  and  only  if  the  canonical  divergence 
is  minimum.  In  the  same  arguments  as  the  standard  exponential  families, 
we  can  say  that  the  g-likelihood  estimator  is  the  orthogonal  projection  from 
77  to  the  model  distribution  M  with  respect  to  V'^^-geodesic.  Hence  the 
g-likelihood  estimator  is  a  quite  natural  generalization  of  the  likelihood 
estimator  from  the  viewpoint  of  differential  geometry. 

We  remark  that  the  q- likelihood  can  be  generalized  by  U -geometry.  The 
notion  of  independence  is  related  to  geometric  structures  on  the  sample 
space  [4]. 

Acknowledgment 

The  authors  wish  to  express  their  sincere  gratitude  to  the  referee  for  his 
carefully  reading  and  for  his  apropos  comments  of  the  paper. 

The  first  named  author  is  partially  supported  by  The  Toyota  Physical 
and  Chemical  Research  Institute  and  by  Grant-in-Aid  for  Encouragement 
of  Young  Scientists  (B)  No.  19740033,  Japan  Society  for  the  Promotion  of 
Science. 


References 

1.  S.  Amari  and  H.  Nagaoka,  Methods  of  information  geometry,  Amer.  Math. 
Soc.,  Providence,  Oxford  University  Press,  Oxford,  2000. 

2.  E.P.  Borgesa,  A  possible  deformed  algebra  and  calculus  inspired  in  nonexten- 
sive  thermostatistics,  Phys.  A,  340(2004),  95-101. 

3.  D.  Ferrari  and  Y.  Yang,  Maximum  Lq-likelihood  estimation,  Ann.  Statist. 
38(2010),  753-783. 

4.  Y.  Fujimoto  and  N.  Murata,  A  Generalization  of  Independence  in  Naive 
Bayes  Model,  Lecture  Notes  in  Computer  Science,  6283(2010),  153-161. 

5.  T.  Kurose,  Conformal-projective  geometry  of  statistical  manifolds,  Interdis- 
cip.  Inform.  Sci.,  8(2002),  89-100. 

6.  H.  Matsuzoe,  On  realization  of  conformally-projectively  flat  statistical  mani¬ 
folds  and  the  divergences,  Hokkaido  Math.  J.,  27(1998),  409-421 

7.  H.  Matsuzoe,  Geometry  of  contrast  functions  and  conformal  geometry,  Hi¬ 
roshima  Math.  J.,  29(1999),  175-191. 


GEOMETRY  FOR  q-EXPONENTIAL  FAMILIES 


71 


8.  H.  Matsuzoe,  Computational  Geometry  from  the  Viewpoint  of  Affine  Differ¬ 
ential  Geometry,  Lecture  Notes  in  Computer  Science  5416(2009),  103-123. 

9.  H.  Matsuzoe,  Geometry  for  statistical  inferences  in  complex  systems,  Toyota 
Research  Report,  63(2011),  177-180. 

10.  J.  Naudts,  Estimators,  escort  probabilities,  and  <j>- exponential  families  in  sta¬ 
tistical  physics,  JIPAM.  J.  Inequal.  Pure  Appl.  Math.,  5(2004),  Article  102 
(electronic). 

11.  J.  Naudts,  Generalised  exponential  families  and  associated  entropy  functions, 
Entropy,  10(2008),  131  149. 

12.  J.  Naudts,  Generalised  Thermostatistics,  Springer,  2011. 

13.  A.  Ohara,  H.  Matsuzoe  and  S.  Amari,  A  dually  flat  structure  on  the  space  of 
escort  distributions,  J.  Phys.:  Conf.  Ser.  201(2010),  No.  012012  (electronic). 

14.  A.  Ohara,  H.  Matsuzoe  and  S.  Amari,  Dually  flat  structure  with  es¬ 
cort  probability  and  its  application  to  alpha- Voronoi  diagrams,  preprint, 
arXiv:1010.4965  [stat-mech]. 

15.  H.  Shima,  The  Geometry  of  Hessian  Structures,  World  Scientific,  2007. 

16.  H.  Suyari  and  M.  Tsukada,  Law  of  Error  in  Tsallis  Statistics,  IEEE  Trans. 
Inform.  Theory,  51(2005),  753-757. 

17.  M.  Tanaka,  Meaning  of  an  escort  distribution  and  t -transformation,  J.  Phys.: 
Conf.  Ser.  201(2010),  No  012007  (electronic). 

18.  C.  Tsallis,  Introduction  to  Nonextensive  Statistical  Mechanics:  Approaching 
a  Complex  World ,  Springer,  New  York,  2009. 


Received  January  31,  2011 
Revised  April  16,  2011 


A  Hessian  domain  constructed  with  a  foliation  by 
1-conformally  flat  statistical  manifolds 


by 

Keiko  UOHASHI 


1 

Abstract.  A  Hessian  domain  is  a  flat  statistical  manifold,  and  its  level  surfaces 
are  l-confornrally  flat  statistical  submanifolds.  In  this  paper  we  show  conditions 
that  l-confornrally  flat  statistical  leaves  of  a  foliation  can  be  realized  as  level  sur¬ 
faces  of  their  common  Hessian  domain  conversely. 

1.  Introduction 

Let  tp  be  a  function  on  a  domain  fl  in  a  real  affine  space  An+1.  Denoting 
by  D  the  canonical  flat  affine  connection  on  An+1,  we  set  g  =  Ddip  and 
suppose  that  g  is  non-degenerate.  Then  a  Hessian  domain  (Q,D,g)  is  a  flat 
statistical  manifold  [8]. 

Kurose  defined  a-conformal  equivalence  and  a-conformal  flatness  of  sta¬ 
tistical  manifolds  [4],  In  [9]  we  proved  that  n-dimensional  level  surfaces  of 
( p  are  1-conformally  flat  statistical  submanifolds  of  (Q,D,g).  In  addition 
we  show  properties  of  foliations  on  Hessian  domains  with  respect  to  statis¬ 
tical  submanifolds  in  [10].  Hao  and  Shima  studied  level  surfaces  on  Hessian 
domains  deeply  in  [2]  [7] .  However  they  studied  foliations  and  statistical  sub¬ 
manifolds  for  given  Hessian  domains.  We  see  few  results  of  the  realization  of 
statistical  manifolds  on  Hessian  domains.  In  [9]  we  show  that  a  1-conformally 
flat  statistical  manifold  can  be  locally  realized  as  a  submanifold  of  a  flat  sta¬ 
tistical  manifold,  constructing  a  level  surface  of  a  Hessian  domain.  However 
we  proved  realization  of  only  ”a”  1-conformally  flat  statistical  manifold.  In 
this  paper  we  give  conditions  for  realization  of  1-conformally  flat  statistical 
manifolds  as  level  surfaces  of  their  common  Hessian  domain. 
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In  section  2  we  recall  properties  of  Hessian  domains,  statistical  manifolds 
and  affine  differential  geometry.  In  section  3  we  prove  a  theorem  on  realiza¬ 
tion  of  1-conformally  flat  statistical  leaves.  In  section  4  we  show  necessity  of 
the  conditions  described  in  the  theorem. 


2.  Hessian  domains  and  Statistical  manifolds 

Let  D  and  {a;1, . . .  ,xn+1}  be  the  canonical  flat  affine  connection  and  the 
canonical  affine  coordinate  system  on  An+1,  i.e.,  Ddx 1  =  0.  If  the  Hes¬ 
sian  Ddp  =  V / dxl dxi)dxl dx^  is  non-degenerate  for  a  function  p  on  a 

domain  H  in  An+\  we  call  (fi,  D,g  =  Dd(p)  a  Hessian  domain.  For  a  torsion- 
free  affine  connection  V  and  a  pseudo-Riemannian  metric  h  on  a  manifold 
A,  the  triple  (A,  V,  h )  is  called  a  statistical  manifold  if  Vh  is  symmetric.  If 
the  curvature  tensor  R  of  V  vanishes,  (A,  V,  h )  is  said  to  be  flat.  A  Hes¬ 
sian  domain  (Q,D,g  =  Ddp)  is  a  flat  statistical  manifold.  Conversely,  a  flat 
statistical  manifold  is  locally  a  Hessian  domain  [1]  [8] . 

For  a  statistical  manifold  (A,  V,  h),  let  V'  be  an  affine  connection  on  A 
such  that 


Xh(Y,  Z )  =  h(VxY. ,  Z)  +  h(Y,  VXZ),  for  X,  Y  and  Z  G  TA, 


where  T A  is  the  set  of  all  tangent  vector  fields  on  A.  The  affine  connection  V' 
is  torsion  free,  and  Vh  symmetric.  Then  V  is  called  the  dual  connection  of 
V,  the  triple  (A,  V,  h )  the  dual  statistical  manifold  of  (A,  V,  h ),  respectively. 

Let  A*+1  and  {x\, . . . ,  x*n+l}  be  the  dual  affine  space  of  Ara+1  and  the 
dual  affine  coordinate  system  of  {a:1, . . .  ,xn+1},  respectively.  We  define  the 
gradient  mapping  i  from  to  A*+1  by 


* 

XtOL  = 


dip 
dxl  ’ 


and  a  flat  affine  connection  D'  on  Q  by 


l*{D'xY)  =  D'xVY)  for  X,  Y  e  Til, 


where  D*xl*(Y)  is  covariant  derivative  along  l  induced  by  the  canonical  flat 
affine  connection  D*  on  A*+1.  Then  (fl,  D' ,g)  is  the  dual  statistical  manifold 
of  (fl,D,g). 
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For  a  G  R,  statistical  manifolds  (N,  V,  h )  and  (N,  V,  h )  are  said  to  be 
a-conformally  equivalent  if  there  exists  a  function  cj)  on  N  such  that 

h(X,Y)  =  e*h{X,Y), 

h(VxY,Z)  =  h(VxY,Z)-1-±^d<l>(Z)h(X,Y) 

+  1-^{d4,(X)h(Y,Z)  +  d<t>(Y)h(X,Z)} 

for  X,  Y  and  Z  G  TN.  A  statistical  manifold  (N,  V,  h)  is  called  ct-conformally 
flat  if  (. N ,  V,  h)  is  locally  a-conformally  equivalent  to  a  flat  statistical  man¬ 
ifold.  Statistical  manifolds  (N,  V,  h)  and  (N,  V,  h)  are  a-conformally  equiv¬ 
alent  if  and  only  if  the  dual  statistical  manifolds  (N,  V',  h)  and  (N,  V ,  h) 
are  (—a) -conformally  equivalent.  Especially,  a  statistical  manifold  (N,  V,  h) 
is  1-conformally  flat  if  and  only  if  the  dual  statistical  manifold  (N,  V7,  h)  is 
(—1) -conformally  flat  [4], 

Henceforth,  we  suppose  that  g  is  positive  definite. 

Let  E  be  the  gradient  vector  field  of  ip  on  H  defined  by 

g(X,  E)  =  dtp{X)  for  X  G  TO, 

where  TVL  is  the  set  of  all  tangent  vector  fields  on  H.  We  set 

E  =  —dip(E)~l E  on  h20  =  {p  e  f!  |  dpp  ^  0}. 

For  p  G  Ep  is  perpendicular  to  TpM  with  respect  to  g,  where  M  C  h20  is 
a  level  surface  of  p  containing  p  and  TPM  is  the  set  of  all  tangent  vectors  at 
p  on  M. 

Let  i  be  a  canonical  immersion  of  an  n-dimensional  level  surface  M  into 
O.  For  D  and  an  affine  immersion  (x,E),  we  can  define  the  induced  affine 
connection  DE,  the  affine  fundamental  form  gE  on  M  by 

DXY  =  DeY  +  gE(X,  Y)E  for  X,  Y  G  TM. 

We  denote  by  DM  and  gM  the  connection  and  the  Riemannian  metric  on 
M  induced  by  D  and  g.  Then  the  triple  (M,  DM ,  gM)  is  the  statistical  sub¬ 
manifold  realized  in  (H,  D ,  g),  which  coincides  with  the  manifold  (M,  DE ,  gE ) 
induced  by  an  affine  immersion  (x,E).  This  fact  leads  the  next  theorem. 
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Theorem  2.1.  ([9])  Let  M  be  a  simply  connected n- dimensional  level  sur¬ 
face  of  tp  on  an  (n  +  1)- dimensional  Hessian  domain  =  Ddip )  with 

a  Riemannian  metric  g,  and  suppose  that  n  >  2.  If  we  consider  (12,  D,g)  a 
flat  statistical  manifold,  (M,  DM ,  gM)  is  a  1- conformally  flat  statistical  sub¬ 
manifold  of  (12,Z2,g),  where  we  denote  by  DM  and  gM  the  connection  and 
the  Riemannian  metric  on  M  induced  by  D  and  g. 

Conversely,  on  realization  of  a  1-conformally  flat  statistical  manifold  we 
have: 

Theorem  2.2.  ([9])  An  arbitrary  1-conformally  flat  statistical  manifold 
of  dim  n  >  2  with  a  Riemannian  metric  can  be  locally  realized  as  a  subman¬ 
ifold  of  a  flat  statistical  manifold  of  dirn(n  +  1). 


3.  Foliations  constructed  by  1-conformally  flat  statistical  mani¬ 
folds 

Let  T  be  a  foliation  on  a  differentiable  manifold  N  of  dimension  n  > 
2  and  codimension  1,  and  for  a  leaf  M  G  T  the  triple  (M,  VM,hM)  a  1- 
conformally  flat  statistical  manifold.  Suppose  that  a  non-degenerate  affine 
immersion  (xM,EM)  realizes  (M,  XM,hM)  in  An+1,  and  that  a  mapping 
x  :  N  — >  12  defined  by  x(p)  =  xM  (p)  for  p  G  M  is  a  diffeomorphism,  where 
12  =  U Me^xM (M)  C  An+1  is  a  domain  diffeomorphic  to  N. 

We  set  iM  is  the  conormal  immersion  for  xM ,  i.e.,  denoting  by  (a,  b)  a 
pairing  of  a  G  A*+1  and  b  G  An+1, 

(iM(p),  Yp)  =  0  for  e  TpM,  {, im(p),E f)  =  1 

for  p  G  M,  considering  Tp An+1  with  An+1.  the  immersion  iM  satisfies  that 

(tf  (Y),  EM)  =  0,  (tf  (Y),  X)  =  -hM(Y,  X )  for  X,  Y  G  TM 

Moreover  the  conormal  immersion  iM  is  ecpiiaffine,  i.e., 

DxEm  =  SEM{X)  G  TM  for  X  G  TM 

(We  call  SeM  the  shape  operator.)  [5]  [6]  [9].  With  notations  in  this  section, 
we  can  describe 

DXY  =  Vf  Y  +  hM{X,  Y)Em  for  X,  Y  G  TM. 
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Then  the  next  theorem  holds. 


Theorem  3.1.  If  a  foliation  IF  satisfies  the  following  conditions,  each 
1-conformally  flat  statistical  leaf  ( M ,  ViU,  hM )  of  T  is  realized  as  a  level  sur¬ 
face  of  the  common  Hessian  domain: 

(i)  a  mapping  E  :  N  — >  Are+1  defined  by  E(p)  =  EM(p)  for  p  G  M  is  differ- 
enciable ; 

(ii)  a  mapping  i  :  N  — >  Ft*  defined  by  ifp)  =  iM  (p)  for  p  G  M  is  a  diffeomor- 
phism,  where  12*  =  U C  A*+1; 

(iii)  DeE  =  pE  for  |i£R; 

(iv)  SE  (A")  =  — ( d\(E )  +  1)(A)  on  M ,  where  A  is  a  function  on  N  such 
that  ex@h(p)  =  i(p),  p  G  N  for  p  G  M. 

Proof.  We  consider  a  manifold  N  a  domain  12  C  An+1,  and  define  a 
metric  g  on  12  by 


9(Y,X)  =  hM(Y,  X),  g{E,E)  =  1, 
g(Y,  E)  =  0  for  X,  Y  e  TM  C  T12. 

Let  us  prove  that  (D,  g)  satisfies  the  Codazzi  equation 

(Dxg)(Y,  Z )  =  (. DYg){X ,  A)  for  all  A,  Tand  Z  G  T12. 

In  the  case  of  X,  Y  and  Z  G  TM,  we  have 

(■ Dxg)(Y,Z )  =  X(g(Y,Z))-g(DxY,Z)-g(Y,DxZ) 

=  X {hM [Y,  Z))  —  g(yxY,  Z)  —  g{Y,  Xx  Z) 
=  CVxhM)(Y,  Z). 

Similarly  it  holds  that 

(■ DYg){X,Z)  =  {V™hM){X,Z ). 

Recall  the  Codazzi  equation  for  an  equiaffine  immersion  (. xM,EM ); 

(X%hM)(Y,Z)  =  {S/yhM){X,  Z) 

[6].  Then  we  have  the  Codazzi  equation  for  ( D,g ). 


5 


In  the  case  of  X,  Y  E  TM  and  E  on  M,  we  have 


(Dxg)(Y,E)  =  X(g(Y,E))  -  g(hM(X,Y)E,E)  -  g(Y,DxE) 

=  - hM{X ,  Y)  -  hM(Y,  SeM(X)). 

Similarly  it  holds  that 

(Dyg)(X,  E)  =  ~hM( X,  Y)  -  hM(X,  SE“(Y)). 

Recall  the  Ricci  equation  for  an  equiaffine  immersion  (. xM,EM ); 

hM(SEM(X),Y)  =  hM(X,  SeM(Y )) 

[6].  Then  we  have  the  Codazzi  equation 

(Dxg)(Y,  E)  =  (. DYg)(X,E ). 

In  the  case  of  X,  Z  E  T M  and  E  on  M,  similarly  we  have 

(Dxg){E,Z)  =  - hM(X,Z )  -  hM(SEM(X),Z). 

Now  recall  a  property  (l^(X),Em)  =  0,  X  E  TM  and  the  condition  (iii) 
DeE  =  jiE.  Then  we  have  DeX  =  0  for  X  E  TM.  In  addition,  conor¬ 
mal  immersions  {(iM ,  hM)} m^e  are  projectively  equivalent  and  conformally 
equivalent,  and  it  holds  that  hM  =  exhM  [6].  Hence  for  p  E  M  the  next 
follows; 

E(g(X,Z))\p  =  E(exhM(X,Z))\p  =  (Ee%hM(X,Z) 

=  (EX)\pex^hM(X,  Z)  =  d\{E)\phM{X ,  Z). 

Thus  it  holds  that 

(DEg)(X,Z)  =  E(g(X,Z))-g(DEX,Z)-g(X,DEZ) 

=  d\(E)\phM (X,  Z). 

By  the  condition  (iv)  we  have  the  Codazzi  equation 

(Dxg)(E,Z)  =  (DEg)(X,Z). 
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In  the  case  of  X  e  TM  and  E  on  M,  we  have 
(Dxg)(E,  E )  =  X(g(E,  E))  -  g(g(X,  E)E,  E)  -  g(E,g(X,  E)E)  =  0. 

Moreover  by  DEX  =  0  and  DeE  =  gE  it  holds  that 

(DEg)(X,  E)  =  X(g(X,  E))  -  g(DEX,  E)  -  g(X,  DeE )  =  0. 

Thus  we  have  the  Codazzi  equation 

(Dxg)(E,E)  =  (DEg)(X,E). 

In  the  case  of  X  —  Y  —  E  and  Z  e  7TL  clearly  we  have 
(Dxg)(Y,  Z)  =  ( DYg)(X,Z )  =  (. DEg){E,Z ). 

Hence  ( D,g )  satisfies  the  Codazzi  equation.  Thus  g  is  a  Hessian  metric 
by  Proposition  2.1  on  [8].  By  the  definition  of  g  we  can  consider  that  each 
leaf  (M,  XAI ,  hM )  of  T  is  a  level  surface  of  the  Hessian  domain  (fl,  D,g).  □ 

4.  Necessity  of  the  conditions 

In  this  section  we  show  that  level  surfaces  of  Hessian  domain  satisfy  the 
conditions  of  Theorem  3.1. 

Let  (f 1,D,g  =  Dd(p)  be  a  simply  connected  (n  +  l)-dimensional  Hes¬ 
sian  domain,  and  (M,  DM ,  gM)  an  n-dimensional  1-conformally  flat  statistical 
submanifold  on  a  level  surface  M  of  (f. 

It  is  clear  that  a  mapping  E  :  — >  An+1  defined  by  E{p)  =  EM  (p)  for 
p  E  M  is  differenciable,  where  an  immersion  (xM,EM)  realizes  (M,  DM ,  gM) 
in  An+1.  It  is  also  clear  that  the  gradient  mapping  t  :  O  — O*  =  i(Q) 
is  a  diffeomorphism  and  coincides  with  the  conormal  immersion  for  xM  on 
M.  Thus  each  level  surface  (M,  DM ,  gM)  satisfies  the  conditions  (i)  (ii)  of 
Theorem  3.1. 

For  proof  of  the  condition  (iii),  we  calculate  each  ( DEg)(E}  X)  and  ( Dxg)(E ,  E) 
for  X  G  TM .  By  the  definitions  of  the  gradient  vector  field  E  for  g  and  the 
conormal  vector  held  E  =  —d(p(E)~1E,  we  have 

(DEg)(E,X)  =  E{g{E,X))-g{DEE,X)-g{E,DEX) 

-=  -g(DEE,  X)  -  dp(E)~2dp{DpX) 

=  —g(DEE,  X)  -  dp(E)~2(E(dp(X))  -  (Dpd<p)(X)) 

=  -g(DEE,  X). 
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In  the  above  we  also  make  use  of  d(p(X)  =  0  and  (D^d(p)( X)  =  g(E,X)  =  0. 
Moreover  it  holds  that 


(Dxg)(E,E)  =X(g(E,E))  -2g(DxE,E)  =  -2 g(SEM(X),E)  =  0. 

From  the  Codazzi  equation  for  ( D,g ),  it  follows  that 
{DEg){E,X)  =  (Dxg){E,E)  =  0. 

Thus  DeE  =  /iE  for  /i  G  R.  Therefore  (M,  DM ,  gM)  satisfies  the  condition 
(iii)  of  Theorem  3.1. 


Remark  4.1.  Hao  and  Shima  calculated  (D^g)(E,  X)  and  ( D\g)(E,E ) 
not  for  (x,  E)  but  for  (x,  E),  and  showed  that  the  transversal  connection  form 
te  vanishes  if  and  only  if  D^E  =  fiE  [2]  [8].  We  gave  the  above  calculation 
with  their  technique. 


For  proof  of  the  condition  (iv),  we  calculate  each  ( Dxg)(E ,  Z )  and  (DEg)(X,  Z ) 
for  X,  Z  E  TM.  By  calculation  appeared  in  proof  of  Theorem  3.1,  we  have 

(Dxg)(E,Z)  =  - g(X,Z)-g(SE“(X),Z ) 

(DEg)(X,Z)  =  d\(E)\rhM(X,Z), 

where  A  is  a  function  on  12  defined  similar  to  A  in  Theorem  3.1.  From  the 
Codazzi  equation  for  (D,g),  it  follows  that 

(Dxg)(E,Z)  =  (DEg)(X,Z). 

Thus  (M,  Dm ,  gM)  satisfies  the  condition  (iv)  SeM (X)  =  — ( d\(E )  +  1)(A"). 

We  describe  necessity  of  the  conditions  (i)  to  (iv)  as  follows. 


Corollary  4.2.  Each  1-conformally  flat  statistical  leaf  (M,  VM ,hM)  of 
a  foliation  T  is  realized  as  a  level  surface  of  the  common  Hessian  domain  if 
and  only  if  the  T  satisfies  the  conditions  (i)  to  (iv)  of  Theorem  3.1. 


Last  we  talk  about  a  projectively  flat  connection  and  a  dual-projectively 
flat  connection.  Kurose  and  Ivanov  proved  the  next  propositions,  respec¬ 
tively. 


Proposition  4.3.  ([4])  A  statistical  manifold  ( N,X7,h )  is  l-conformally 
flat  if  and  only  if  the  dual  connection  V'  is  a  projectively  flat  connection  with 
symmetric  Ricci  tensor. 


Proposition  4.4.  ([3])  A  statistical  manifold  ( N,X7,h )  is  l-conformally 
flat  if  and  only  if  V  is  a  dual-projectively  flat  connection  with  symmetric 
Ricci  tensor. 


Thus  we  can  describe  Corollary  4.2  as  the  next. 


Corollary  4.5.  Let  VAI  be  a  dual-projectively  flat  connection  with  sym¬ 
metric  Ricci  tensor  for  all  M  e  T .  Then  each  statistical  leaf  ( M ,  V  ,  hM) 
of  a  foliation  T  is  realized  as  a  level  surface  of  the  common  Hessian  domain 
if  and  only  if  IF  satisfies  the  conditions  (i)  to  (iv)  of  Theorem  3.1. 
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Harmonic  maps  relative  to 
^-connections  on  statistical  manifolds 


Keiko  Uohashi 


Abstract.  In  this  paper  we  study  harmonic  maps  relative  to  a-connections, 
and  not  always  relative  to  Levi-Civita  connections,  on  statistical  mani¬ 
folds.  In  particular,  harmonic  maps  on  a-conformally  equivalent  statisti¬ 
cal  manifolds  are  discussed,  and  conditions  for  harmonicity  are  given  by 
parameters  a  and  dimensions  n.  As  the  application  we  also  describe  har¬ 
monic  maps  between  level  surfaces  of  a  Hessian  domain  with  a-conformally 
flat  connections. 

M.S.C.  2010:  53A15,  53C43. 

Key  words:  harmonic  map;  statistical  manifold;  dual  connection;  conformal  trans¬ 
formation;  Hessian  domain. 

1  Introduction 

Harmonic  maps  are  important  to  research  for  geometry,  physics,  and  so  on.  On 
the  other  hand  statistical  manifolds  have  been  studied  in  terms  of  affine  geometry, 
information  geometry,  statistical  mechanics,  and  so  on  [1].  In  relation  to  them  Shima 
gave  conditions  for  harmonicity  of  gradient  mappings  of  level  surfaces  on  a  Hessian 
domain,  which  is  a  typical  example  for  a  dually  flat  statistical  manifold  [7]  [8]. 

Level  surfaces  on  a  Hessian  domain  are  known  as  1-  and  (— l)-conformally  flat  sta¬ 
tistical  manifolds  for  the  primal  connection  and  for  the  dual  connection,  respectively 
[10].  Then  the  gradient  mappings  are  considered  harmonic  maps  relative  to  the  dual 
connection,  i.e. ,  the  (— l)-connection.  However  Shima  investigated  harmonic  maps  on 
n-dimensional  level  surfaces  into  an  (n  +  l)-dimensional  dual  affine  space,  and  not 
into  the  other  level  surfaces.  In  addition  Nomizu  and  Sasaki  calculated  the  Lapla- 
cian  of  centro-affine  immersions  into  an  affine  space,  which  generate  projectively  flat 
statistical  manifolds,  i.e.,  (— l)-conformally  flat  statistical  manifolds.  However  they 
show  no  harmonic  maps  between  two  centro-affine  hypersurfaces  in  [6] . 

Then  we  treat  harmonic  maps  relative  to  a-connections  between  a-conformally 
equivalent  statistical  manifolds  including  the  case  of  a  =  —1,0  (The  O-connection 
means  the  Levi-Civita  connection.).  In  this  paper,  existence  of  non  trivial  harmonic 
maps  for  a-connections  is  shown  with  conditions  of  a-parameters  and  dimensions  n. 
Finally,  we  describe  harmonic  maps  between  level  surfaces  of  a  Hessian  domain  for 
a-conformally  flat  connections. 

Applied  Sciences,  Vol.14,  2012,  pp.  82-88. 

©  Balkan  Society  of  Geometers,  Geometry  Balkan  Press  2012. 
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2  Statistical  manifolds  and  o-conformal  equivalence 

We  recall  definitions  of  terms  on  statistical  manifolds. 

For  a  torsion-free  affine  connection  V  and  a  pseudo-Riemannian  metric  ft  on  a 
manifold  TV,  the  triple  ( TV ,  V,  ft)  is  called  a  statistical  manifold  if  Vft  is  symmetric.  If 
the  curvature  tensor  R  of  V  vanishes,  (TV,  V,  ft)  is  said  to  be  flat. 

For  a  statistical  manifold  ( TV ,  V,  ft),  let  V'  be  an  affine  connection  on  TV  such  that 

Xh(Y,  Z )  =  h(VxY,  Z)  +  h(Y,  VXZ)  for  V,  Y  and  Z  G  r(TTV), 

where  T(TTV)  is  the  set  of  smooth  tangent  vector  fields  on  TV.  The  affine  connection 
V'  is  torsion  free,  and  V'ft  symmetric.  Then  V'  is  called  the  dual  connection  of  V,  the 
triple  (TV,  V',  ft)  the  dual  statistical  manifold  of  ( TV ,  V,  ft),  and  (V,  V',  ft)  the  clualistic 
structure  on  TV.  The  curvature  tensor  of  V'  vanishes  if  and  only  if  that  of  V  does, 
and  then  (V,  V',  ft.)  is  called  the  dually  flat  structure  [1]. 

For  a  real  number  a,  statistical  manifolds  ( TV ,  V,  ft)  and  (V,  V,  ft)  are  said  to  be 
a-conformally  equivalent  if  there  exists  a  function  <f>  on  TV  such  that 

(2.1)  h(X,Y)  =  e0ft(X,F), 

-  1  -I-  (X 

(2.2)  ft(VA-Y,  Z)  =  h(VxY,  Z)  -  — - — dcj>(Z)h(X,  V) 

+  ^-{d4{X)h{Y,  Z)  +  d<f>(Y)h(X,  Z)} 

for  X,Y  and  Z  G  T(TN).  Two  statistical  manifolds  (N,  V,  ft)  and  (N,  V,ft)  are 
a-conformally  equivalent  if  and  only  if  the  dual  statistical  manifolds  (N,  V',ft)  and 
(N,  V',ft)  are  (— a)-conformally  equivalent.  A  statistical  manifold  (TV,  V,ft)  is  called 
a-conformally  flat  if  (TV,  V,  ft)  is  locally  a-conformally  equivalent  to  a  flat  statistical 
manifold  [4]. 

3  Harmonic  maps  for  ct-conformal  equivalence 

Let  (TV,  V,  ft  )  and  (TV,  V,  ft)  be  a-conformally  equivalent  statistical  manifolds  of  dim  n  > 
2,  and  {a;1,  •  •  •  xn}  a  local  coordinate  system  on  TV.  Suppose  that  ft  and  ft  are  Rieman- 
nian  metrices.  We  set  ft.y  =  h{d/dxl,d/dx^)  and  [ft®-7’]  =  [ft^]-1.  Let  TTid  :  TV  — >  TV 
be  the  identity  map,  i.e.,  nid( x)  =  x  for  x  G  TV,  and  7r.j d*  the  differential  of  n id-  If 
cautioning  about  metrics  and  connections,  we  denote  by  7r,d  :  (TV,  V,ft)  — >  (TV,  V,ft). 
We  define  a  harmonic  map  relative  to  (ft,  V,  V)  as  follows. 

Definition  3.1.  If  a  tension  field  r^h ,>v .v)(7ri(i)  vanishes,  i.e.,  R/,.v,v)(7r»<i)  =  0  on  X, 
the  map  7 :  (TV,  V,  ft)  — >  (TV,  V,  ft)  is  said  to  be  a  harmonic  map  relative  to  (ft,  V,  V), 
where  the  tension  field  is  defined  by 

Tl  p\  C\ 

T(h,V,V)(Kid)  ■■=  E  €  r(7T-/TTV) 

*i3=l 


(3.1) 
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(3.2) 


=  E  hij{y 


i,j= 1 


_d_  _  v  _d_ 

dxi  af1  dxi 


)  g  r(TJV). 


Then  the  next  theorem  holds. 

Theorem  3.1.  For  a-conformally  equivalent  statistical  manifolds  ( N ,  V,  h)  and  (N,  V,  ft.) 
of  dim  n  >  2  satisfying  (2.1)  and  (2.2),  ifa=  —  (n—  2)/(n+2)  or  <j)  is  a  constant  func¬ 
tion  on  N,  the  identity  map  rtid  '■  (IV,  V,ft.)  — »  (IV,  V,  ft)  is  a  harmonic  map  relative 
to  (ft,  V,  V). 

Proof.  By  (2.2)  and  (3.2),  for  k  G  {1,  •  •  •  ,  n}  we  have 

f)  n  -  r)  r)  r) 

pl-XE  -v*5J>'5F> 


Y-'  1  +  a  ,,,  d  d  d  1-a  d  d  d 

=  E  71  { - 2-^(^fc)M^,^7)  +  ^^{#(l3Gl)Ml3G7,^) 

i,j=l 


'  dxl '  v  ’  dxk ' 


=  E 


fti=l 
=  {- 


2  cftcfc 


1  -  ^  h  ,  ^  h  n 

2 


1  +  a  d(f>  1  —  a  <9</>  r  yA  d<t> 


dxk+  2  + 

*=i  l=i 


.  1  +  a  1  —  o  1  9(4 

=  (—  ”  +  ^  2>a?  =  -2«"  +  2>“  +  (" - 2»5?' 

where  Sij  is  the  Kronecker’s  delta.  Therefore,  if  r^hy ,v)(7r*d)  —  0,  it  holds  that 
(n  +  2)  a  +  (n  —  2)  =  0  or  d(j>/dxk  =  0  for  all  k  G  {1,  •  •  •  ,  n}  at  each  point  in  ./V.  Thus 
we  obtain  Theorem  3.1.  □ 


4  ct-connections  on  level  surfaces  of 
a  Hessian  domain 


In  this  section  we  show  relations  with  a-connections  and  Hessian  domains. 

Let  N  be  a  manifold  with  a  dualistic  structure  (V,V',ft).  For  a  G  R,  an  affine 
connection  defined  by 


(4.1) 


V(a) 


1  +  a 
2 


V  + 


V' 


is  called  an  a-connection  of  (N,  V,  ft).  The  triple  ( N ,  V^“i,  ft)  is  also  a  statistical  man¬ 
ifold,  and  the  dual  connection  of  The  1-connection,  the  (— l)-connection 

and  the  O-connection  coincide  with  V,  V'  and  the  Levi-Civita  connection  of  (N,  ft), 
respectively.  An  a-connection  is  not  always  flat  [1]. 
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Let  D  and  {a;1, . . .  ,xn+1}  be  the  canonical  flat  affine  connection  and  the  canon¬ 
ical  affine  coordinate  system  on  A”+1,  i.e. ,  Ddx1  =  0.  If  the  Hessian  Ddtp  = 
(d2<p/dxldxi)dxldxi  is  non-degenerate  for  a  function  ip  on  a  domain  12  in 
An+1,  we  call  (12,  D,g  =  Ddip)  a  Hessian  domain.  A  Hessian  domain  is  a  flat  statis¬ 
tical  manifold.  Conversely,  a  flat  statistical  manifold  is  locally  a  Hessian  domain  [1] 
[8], 

Let  A*+1  and  {x\, . . . ,  x*+1}  be  the  dual  affine  space  of  A"+1  and  the  dual  affine 
coordinate  system  of  {x1, . . .  ,xn+1},  respectively.  We  define  the  gradient  mapping  t 
from  12  to  A*  ,  ,  by 


Xi  ° L  ’ 

and  a  flat  affine  connection  D'  on  12  by 

l*{D'xY)  =  D*xi*{Y)  for  x,yer(ri2), 

where  Dxl*(Y)  is  covariant  derivative  along  i  induced  by  the  canonical  flat  affine 
connection  D*  on  A*+1.  Then  (12 ,D',g)  is  the  dual  statistical  manifold  of  (H,D,g) 

[7]  [8]. 

For  a  simply  connected  level  surface  M  of  <p  with  dimn  >  2,  we  denote  by  DM 
and  gM  the  connection  and  the  Riemannian  metric  on  M  induced  by  D  and  g,  respec¬ 
tively.  Then  (M,  DM ,  gM)  is  a  1-conformally  flat  statistical  submanifold  of  (1 l,D,g) 
by  Theorem  2.1  in  [10]. 

We  consider  two  simply  connected  level  surfaces  of  dim  n  >  2  (M,  D ,  g),  (M,  D ,  g) 
1-conformally  flat  statistical  submanifolds  of  (12,  D ,  g).  For  p  £  M,  let  A  be  a  function 
on  M  such  that  ex^ph(p)  £  t(M),  where  i  is  the  restriction  of  the  gradient  mapping  t 
to  M,  and  set  (eA)(p)  =  ex^p\  Note  that  the  function  eA  means  the  projection  of  M 
to  M  with  respect  to  the  dual  affine  coordinate  system  of  12. 

We  define  a  map  n  :  M  — >  M  by 

t  o  7r  =  ext, 

denoting  also  by  l  the  restriction  of  the  gradient  mapping  i  to  M.  We  denote  by  D' 
an  affine  connection  on  M  defined  by 

M&xY)  =  D'„.{x)tt*(Y)  for  X,Y  £  T(TM), 

and  by  g  a  Riemannian  metric  on  M  such  that 

g(X,Y)  =  exg(X,  Y)  =  g(n.(X),  7r*(F)). 

Then  the  next  theorem  is  known  (cf.  [4]  [5]). 

Theorem  4.1.  ([11])  For  affine  connections  D'  ,D'  on  M,  we  have 

(i)  D'  and  D'  are  projectively  equivalent. 

(ii)  (Af,D\g)  and  ( M,D',g )  are  (— 1)- conformally  equivalent. 

We  denote  by  D  an  affine  connection  on  M  defined  by 

n*(DxY)  =D„m(x)w,(Y)  for  X,Y  £T{TM). 


From  duality  of  D  and  D',  D  is  the  dual  connection  of  D'  on  M.  Then  the  next 
theorem  holds  (cf.  [3]  [4]). 
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Theorem  4.2.  ([11])  For  affine  connections  D,D  on  M,  we  have 

(i)  D  and  D  are  dual-projectively  equivalent. 

(ii)  ( M,D,g )  and  (. M,D,g )  are  l-conformally  equivalent. 

For  a-connections  D^a\  defined  similarly  to  (4.1),  we  obtain  the 

next  corollary  by  Theorem  4.1,  Theorem  4.2  and  by  (2.2)  with  <j>  =  A  [9]. 

Corollary  4.3.  For  affine  connections  D^a\  D (  onM,  (M,  D(a\g)  and(M,D^a\g) 
are  a- conformally  equivalent. 


5  Harmonic  maps  relative  to  ci-connections  on 
level  surfaces 

We  denote  D^Xjir*(Y)  by  Dx^7rt(Y),  considering  it  in  the  induced  section  T{tt~1TM). 
Let  {x1, . . . ,  xn}  be  a  local  coordinate  system  on  M.  A  harmonic  map  between  level 
surfaces  (M,D^a\g)  and  (M,D^a\g)  is  defined  as  follows. 

Definition  5.1.  If  a  tension  field  D(a)  £,(«))  (?r)  vanishes,  i.e.,  r(ff  D(a)  jj(a)\(7r)  =  0 

on  M,  the  map  7 r  :  (M,D^a\g)  — >  (M ,  D^a\  g)  is  said  to  be  a  harmonic  map  relative 
to  (g,  D^a\  D^),  where  the  tension  field  defined  by 

FI  r\  r\ 

(5-1)  :=  £  gij  q^))  -  i r*(Dg— )}  €  r^1™). 


Now  we  give  conditions  for  harmonicity  of  a  map  it  :  M  M  relative  to 

(fl,  £>(“),  £>(“>). 

Theorem  5.1.  Let  (M,D^a\g)  and  (M,D^a\g)  be  simply  connected  n-dimensional 
level  surfaces  of  an  (n  +  1)- dimensional  Hessian  domain  (il,D,g)  with  n  >  2.  If 
a  =  —(n  —  2)/(n  +  2)  or  X  is  a  constant  function  on  M,  a  map  n  :  (M,  D^a\g)  — > 
(M,E>(a\g)  is  a  harmonic  map  relative  to  (g,  D^a\  D^),  where 

to  7 r  =  eAt,  (ex)(p)  =  ex^p\  ex(-p\  (p)  €  t(M),  p  €  M, 

and  l,  t  are  the  restrictions  of  the  gradient  mapping  on  fi  to  M,  M ,  respectively. 

Proof.  The  tension  field  of  the  map  7r  relative  to  (g,  D^a\  D^)  is  described  with 
(M,  D^a\g),  which  is  the  pull-back  of  (M,  D(a\g),  as  follows. 


r(g, £>(»),£>(<*) 


O)  =  9 

i,j  =  1 


dxl ' 


=  t  [L)  -  ^))  =  M  E  -  t>(l  ^)) 

*,i=i  *,i=i 

Identifying  with  TXM ,  and  considering  the  definition  of 


d 


^.i,w)W  =  '1E«li(4*s j_D” 

*,j=i 


dxl 

7 r,  we  have 

.(a)  ^ 


9x1 


)■ 
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By  Corollary  4.3,  (M,  D^a\  g)  and  [M,  D^a\g)  are  a-conformally  equivalent,  so  that 
we  have  the  equation  (2.2)  with  <f>  =  A,  h  =  g,  \7  =  D ^  and  V  =  D ^  for  X,Y 
and  Z  £  r(TM).  Then  it  holds  similarly  to  the  proof  of  Theorem  3.1  that  for 
k  €  {!,•••  ,n} 


9(T(g,D^,D^)^)  =  9(e- 


Es 

*>.#= i 


iJ(D('a'>  ^  _  n(a)  ®  \ 

1  Adxi  ^dx^'dxk> 


a  v—'  a  t  1  Y  a  .  d  .  .  d 
=  e  2>”{ - —dX(^)g( 


i,j= 1 


9  1  -  a  d  d  d 

dxi '  dxi  2  <9.U  ^ 


+dA<9^'SO> 

=  (-^  ' »  +  ^  ' 2)  =  -i {(»  +  2)o  +  (»  -  2)}  eVg, 

Therefore,  if  D(a)  =  0,  it  holds  that  (n  +  2)a  +  (n  —  2)  =  0  or  dA/dxk  =  0 

for  all  k  £  {1,  •  •  •  ,  n}  at  each  point  in  N.  Thus  we  obtain  Theorem  5.1.  □ 


Comparing  proofs  of  Theorem  3.1  and  Theorem  5.1,  we  have  the  following  about 
two  tension  fields. 

Corollary  5.2.  Let  ir  :  (M,D^a\g)  — >  {M,D^a\g)  be  the  map  defined  at  Theorem 
5.1,  and  Hid  ■  {dH,D^a\g)  — >  (M,D^a\g)  the  identity  map,  where  (M,D^a\g)  is  the 
pidl-back  of{M,D^\g)  by  i r.  Then  it  holds  that 

T(g,D(“),b(“))(7r)  =  eA'r(g,D(“),D(“))(7ri<i)- 


Remark  5.2.  For  n  =  2,  if  and  only  if  a  =  0,  there  exist  harmonic  maps  7 and  7r 
with  non  constant  functions  <f>  and  A,  respectively. 

Remark  5.3.  For  n  >  3,  it  holds  that  —1  <  a  <  0  if  a  map  7 or  7r  is  a  harmonic 
map  with  a  non  constant  function  <f>  or  A,  respectively. 

Remark  5.4.  For  a  <  —1  and  a  >  0,  there  exist  no  harmonic  maps  7Tt,i  and  7r  with 
non  constant  functions  f>  and  A,  respectively. 
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Escort  probability  is  a  certain  modification  of  ordinary  probability  and  a  conformally 
transformed  structure  can  be  introduced  on  the  space  of  its  distributions.  In  this  con¬ 
tribution  applications  of  escort  probabilities  and  such  a  structure  are  focused  on.  We 
demonstrate  that  they  naturally  appear  and  play  important  roles  for  computationally 
efficient  method  to  construct  a-Voronoi  partitions  and  analysis  of  related  dynamical 
systems  on  the  simplex. 

Keywords :  Voronoi  partitions;  dynamical  systems;  information  geometry. 


1.  Introduction 

In  the  research  areas  of  multifractals  and  nonextensive  statistical  mechanics,  escort 
probability1-3  appears  in  many  aspects  and  is  widely  recognized  as  an  important 
concept.  It  has  been  known4,5  that  nonextensive  entropies  are  closely  connected 
with  the  a-geometry.6,7  Further,  we  have  geometrically  studied  the  space  of  escort 
distributions  and  reported8-10  that  the  well-established  and  abundant  structure 
(called  the  dually  flat  structure )  can  be  introduced  by  a  conformal  transformation 
of  the  a-geometry. 

The  purpose  of  this  contribution  is  to  show  that  escort  probability  and  the  as¬ 
sociated  conformal  structure  are  also  natural  and  useful  to  the  other  applications. 


1250063-1 


A.  Ohara,  H.  Matsuzoe  &  S.-I.  Amari 


First,  we  discuss  the  Voronoi  partition  with  respect  to  the  a-divergence  (or  Renyi 
divergence) .  The  Voronoi  partitions  on  the  space  of  probability  distributions  with 
the  Kullback-Leibler, 11,12  or  Bregman  divergences13  are  useful  tools  for  various 
statistical  modeling  problems  involving  pattern  classification,  clustering,  likelihood 
ratio  test  and  so  on.  See  also  the  literature14-16  for  related  problems.  The  largest  ad¬ 
vantage  to  take  account  of  cc-divergences  is  their  invariance  under  transformations 
by  sufficient  statistics,7,17  which  is  a  significant  requirement  for  those  statistical 
applications.  In  computational  aspect,  the  conformal  flattening  of  the  a-geometry 
enables  us  to  invoke  the  standard  algorithm18-19  using  a  potential  function  and  an 
upper  envelop  of  hyperplanes  with  the  escort  probabilities  as  coordinates.  As  an¬ 
other  application,  we  explore  properties  of  dynamical  systems  defined  by  the  escort 
transformation  and  the  gradient  with  respect  to  the  conformal  metric.  These  flows 
are  fundamental  from  geometrical  viewpoints20  and  found  to  possess  interesting 
properties. 

The  paper  is  organized  as  follows:  Sec.  2  is  a  short  review  of  properties  of  infor¬ 
mation  geometric  structure  induced  on  the  family  of  escort  distributions  obtained 
by  the  authors.8  Section  3  describes  the  first  application  of  escort  probability  and 
the  conformal  geometric  structure  to  a- Voronoi  partitions  on  the  simplex.  The  prop¬ 
erties  including  computational  efficiency  of  a  construction  algorithm  are  discussed. 
Further,  a  formula  for  a-centroid  is  touched  upon.  In  Sec.  4,  we  discuss  properties 
of  dynamical  systems  related  with  escort  transformation  and  gradient  flows  in  view 
of  the  conformal  geometry. 

In  the  sequel,  we  use  two  equivalent  parameters  q  and  a  following  to  conventions 
of  several  research  areas,  but  their  relation  is  fixed  as  q  =  (1  +  a)/2.  Additionally, 
we  assume  that  q  >  0. 


2.  Preliminary  Results 

In  this  section,  we  review  and  summarize  results  in  Ref.  8. 
Let  Sn  denote  the  n-dimensional  probability  simplex,  i.e. 


Sn  :=  {p=  ( Pi ) 


n+1  N 

Pi  >  o,j>  =  i 

i=  1 


(1) 


and  pi,i  =  1, . . . ,  n  +  1  denote  probabilities  of  n  +  1  states.  We  introduce  the  a- 
geometric  structure6,7  on  Sn.  Let  {<9,},  i  =  1, . . . ,  n  be  natural  basis  tangent  vector 
fields  on  Sn  defined  by 


5, 


d_  _  d 

dpi  dpn+i ’ 


i  =  1, . . .  ,n , 


(2) 


where  pn+i  =  1  —  THi-iPi-  Now  we  define  a  Riemannian  metric  g  on  Sn  called  the 
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Fisher  metric : 

9ij(P ) 


g{9i,dj) 


1 

Pn+1 


n+1 

=  'ffjPk{di\ogPk)(d3\°gpk),  i,j  =  l,...,n.  (3) 

k= 1 

Further,  define  a  torsion-free  affine  connection  called  the  a- connection  which 
is  represented  in  its  coefficients  with  a  real  parameter  a  by 

Tij)k(P )  =  +  PkQij'j  ,  i,j,k  =  l,...,n,  (4) 

where  +■  is  equal  to  one  if  i  =  j  =  k  and  zero  otherwise.  Then  we  have  the 
a-covariant  derivative  which  gives 

n 

fc= i 

when  it  is  applied  to  the  vector  fields  di  and  dj.  We  can  define  a  distance-like 
function  on  Sn  x  Sn  for  a  ^  ±1  by 

D^\p,r)  =  / 1  -  ^(pi)(1“Q)/2(n)(1+Q)/2 

l  i—  1 

which  we  call  the  a- divergence.  The  Fisher  metric  g  and  the  a-connection  can 
be  derived  from  the  a-divergence.7,21 

Since  and  V^_a'  geometrically  play  dualistic  roles6,7  with  respect  to  g ,  we 
consider  the  triple  (<7,  V(+  V(-~a')),  which  is  called  the  a-geometric  structure  on 
Sn .  The  properties  of  the  Tsallis  entropy  are  studied  through  the  a-geometry.4,5 

While  the  a-geometric  structure  for  a  ^  ±1  is  not  flat,  we  reported8  that  it  can 
be  flattened  via  a  certain  conformal  transformation22-25  to  a  nonstandard  dually  flat 
structure6,7  denoted  by  (h,  V,  V*).  The  theoretical  advantage  or  interesting  aspect 
of  such  a  conformally  flattening  is  that  we  can  obtain  the  Legendre  structure  on  Sn 
preserving  several  properties  of  the  a-geometric  structure.  We  summarize  the  result 
in  the  following  proposition  by  preparing  some  notation:  the  escort  probability 1  Pi 
and  a  function  Zq  are  respectively  defined  for  q  £  R  by 


am  -  E+ 


n+1 


*  =  n  +  1,  Zq(p):=^2 


0p+ 


For  0  <  q  with  +  1,  we  define  two  functions  by 
s1"?  -  1 


i= 1 


ln90)  := 


s  >  0,  exp  (t)  :=  [1  +  (1  -  g)i]+/(1  q) ,  t&  R , 


l_q  1  ~  —  — ^q\"/  ■  1‘  1  V"J  + 

where  [f]  +  :=  max{0,f},  and  the  so-called  Tsallis  entropy26  by 

- 1 


Sgip)  ■  = 


1-9 


(5) 
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Note  that  s  =  exp  (ln9(s))  holds  and  they  respectively  recover  the  usual 
logarithmic,  exponential  function  and  the  Boltzmann-Gibbs-Shannon  entropy 
—  Y^l=i  Pi  ^nPi  when  q  — >•  1.  For  q  >  0,  lng(s)  is  concave  on  s  >  0. 

Proposition  1.  The  dually  flat  structure  (h,  V,  V*)  on  Sn  is  induced  via  a  con¬ 
formal  transformation  from  the  a-structure  (g,V(a\V'al)  on  Sn .  The  induced 
potential  functions  ip,  ip* ,  and  dually  flat  affine  coordinate  systems  (0l , . . . ,  0n )  and 
(rji, . . . ,  rjn)  are  represented  as  follows: 

9\p)  =  In  q(jpi)  -  In  q(pn+i),  i  = 

Vi(.P)  =  Pi{p),  i=l,-..,n, 
i>(0(p))  =  -higOn+i) , 

0*(p(p))  =  “(A(p)-g)  , 

K 

where  n  =  (1  — a2)/4  =  q(l  —  q)  is  the  scalar  curvature  of  the  a-structure,  0"+1  =  0, 
pn+i  :=  Pn+i(p)  =  1  —  Y^i=i  Pi(p)  and  A  =  1/Z9  is  a  conformal  factor,  i.e.  h  =  A g. 

Further,  the  coordinate  systems  (0l,...,0n)  and  (pi, . . . ,  r)n)  are  V-  and  V*- 
affine,  respectively. 


For  the  proofs  of  Proposition  1  and  necessary  lemmas,  see  Ref.  27.  The  result 
is  extended  to  the  q-exponential  family  with  continuous  random  variables.9,10 
Note  that  by  defining  what  we  call  the  conformal  divergence  p, 

n+1 

p{p,r)  :=  A (r)£>^(p,r)  =  Y  -P»(r)  (In q{jpf)  -  ln+i)) 

2—1 

n 

=  1/>{0(P))  +  ^*{v(r))  -  Y  9l(p)Vi{r),  (6) 

i= 1 


we  can  confirm  the  Legendre  structure,  i.e.  relations  p(p,p)  =  0,  V  p  G  Sn  and 


i  =  1, . . . ,  n . 


(7) 


The  dual  potential  ip*  can  be  alternatively  represented8  in  p  by 


0*  =  ln9  ( - /  ,  , 

Vexp  q(Sq{p))J 

which  is  known  as  the  negative  of  the  normalized  Tsallis  entropy. 28-30  Thus,  when 
q  — ?>  1,  we  have  the  standard  dually  flat  structure  on  Sn  as  follows: 


n+1 

0 -t -lnp„+i,  ip*  -»  r  Pi  In  p.i  el  ->  \n(pi/pn+1),  pi  -$■  pit  i  =  l,...,n. 
2—1 
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Finally,  it  should  be  remarked  that  the  both  structures  (h,  V,V*)  and 
(g,  V(a\  V^~a^)  are  related  in  terms  of  not  only  the  conformality  of  the  metrics 
h  =  \g  but  also  the  projective  equivalence 31  between  the  connections  V*  and 
V(_a),a  which  implies  that  a  curve  on  Sn  is  V*-geodesic  if  and  only  if  it  is 
geodesic.b  More  generally,  a  submanifold  in  Sn  is  V*-autoparallel  if  and  only  if  it 
is  V^_“-l-autoparallel.  For  (h,  V,  V*),  in  particular,  a  submanifold  is  V-  (resp.  V*-) 
autoparallel  when  the  affine  coordinates  9l  (resp.  gf)  are  affinely  parametrized  by 
/3J  ,  j  =  1, . . . ,  to  <  n  as  9l  =  Ylj=\  +  cl,  for  i  =  1, . . . ,  n  +  1  (similarly  for  rji). 
For  example,  the  q-exponential  family 

Pi  =  exp?{6»z  -  4if3)},  i  =  1, . . . ,  n  +  1 ,  (8) 


where  ip  is  a  normalizing  term  defined  by  ip  =  9n+1  +  ip,  is  V-autoparallcl  in  a 
proper  domain  of  (3.  These  properties  are  crucially  used  in  the  following  sections. 
Proposition  1  with  (7)  implies  that 


Pi  = 


dip 

Wv 


i  =  1, 


(9) 


for  pi  =  exp q(9z  —  ip),  i  =  1, . . .  ,n  and  pn+i  =  exp9(— ip).  This  relation  can  be 
regarded  as  a  special  case  of  a  known  one3,32  for  the  q-exponential  family  (8),  using 
the  escort  expectation,2 


n+1  .  n+1  rs 


dip 

w 


because  (9)  is  derived  when  a)  =  <5* ,  j  =  1, . . . ,  n  and  a^+1  =  c1  =  0. 


3.  Applications  to  Construction  of  Alpha- Voronoi  Partitions  and 
Alpha-Centroids 

For  given  m  points  pi,  ■  ■  ■  ,pm  on  Sn  we  define  a- Voronoi  regions  on  Sn  using  the 
a-divergence  as  follows: 

Vor(a)(pfc)  :=  Q{p  e  Sn\D{a\pk,p)  <  D{a){pi,p)},  k  =  1,. . .  ,m . 
l^k 

An  a-  Voronoi  partition  ( diagram )  on  Sn  is  a  collection  of  the  a- Voronoi  regions  and 
their  boundaries.  Note  that  approaches  the  Kullback-Leibler  (KL)  divergence 
if  a  — >  —1,  and  D is  called  the  Hellingcr  distance.  If  we  use  the  Renyi  divergence33 
of  order  a  ^  1  defined  by 

,  n+1 

Da(p,r)  := - 7ln^(pi)“(ri)1_“ 

a  —  1  z — ' 


aNote  that  V*  is  projectively  equivalent  with  in  Ref.  8  because  there  we  adopted  a  different 
correspondence  of  parameters:  q  =  (1  —  ot)/2. 

bPrecisely  speaking,  the  term  “geodesic”  should  be  replaced  by  “pre- geodesic” . 
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instead  of  the  a-divergence,  Vor^1”20^  (pk)  gives  the  corresponding  Voronoi  region 
because  of  their  one-to-one  functional  relationship. 

The  standard  algorithm  using  projection  of  a  polyhedron18,19  commonly  works 
well  to  construct  Voronoi  partitions  for  the  Euclidean  distance,19  the  KL  diver¬ 
gence.12  The  algorithm  is  generally  applicable  if  a  divergence  function  is  of  Bregman 
type,13  which  is  represented  by  the  remainder  of  the  first  order  Taylor  expansion  of 
a  convex  potential  function  in  a  suitable  coordinate  system.  Geometrically  speak¬ 
ing,  this  implies  that  i)  the  divergence  is  of  the  form  (6)  in  a  dually  flat  structure 
and  ii)  its  affine  coordinate  system  is  chosen  to  realize  the  corresponding  Voronoi 
partitions.  In  this  coordinate  system  with  one  extra  complementary  coordinate  the 
polyhedron  is  expressed  as  the  upper  envelop  of  m  hyperplanes  tangent  to  the 
potential  function. 

A  problem  for  the  case  of  the  a- Voronoi  partition  is  that  the  a-divergence  on 
Sn  cannot  be  represented  as  a  remainder  of  any  convex  potentials.  The  following 
theorem,  however,  claims  that  the  problem  is  resolved  by  Proposition  1,  i.e.  con¬ 
formally  transforming  the  a-geometry  to  the  dually  flat  structure  (h,  V,  V*)  and 
using  the  conformal  divergence  p  and  escort  probabilities  as  a  coordinate  system. 

Here,  we  denote  the  space  of  escort  distributions  by  £n  and  represent  the  point 
on  £n  by  P  =  (Pi, . . . ,  Pn)  because  Pn+1  =  1  -  pi- 

Theorem  1. 

(i)  The  bisector  of  pk  and  pi  defined  by  {p\D^a\pk,p)  =  £)W(pj,p)}  is  a  simul¬ 
taneously  V(-a)-  and  V* -autoparallel  hypersurface  on  Sn. 

(ii)  Let  Tiki  k  =  1, ...  ,m  be  the  hyperplane  in  £n  x  R  which  is  respectively  tangent 
at  (Pk,ip*(Pk))  to  the  hypersurface  {{P,y)\y  =  tp*{P)},  where  Pk  =  P{Pk )• 
The  a-Voronoi  diagram  can  be  constructed  on  £n  as  the  projection  of  the  upper 
envelope  ofTLk ’s  along  the  y-axis. 

Proof,  (i)  Consider  the  V^“-* -geodesic  7^“)  connecting  pk  and  pi ,  and  let  p  be 
the  midpoint  on  7^“)  satisfying  D^a\pklp)  =  D^a\pi,p).  Denote  by  B  the 
autoparallel  hypersurface  that  is  orthogonal  to  7^  and  passes  p.  Then,  for  all 
r  £  B,  the  modified  Pythagorean  theorem4,23  implies  the  following  equality: 

D^a)(pk,r)  =  D{a)  (pk,p)  +  D{a)  (p,r)  -  nD{a)  {pklp)D^a)  {p,r) 

=  D(a\pup)  +  D^a\p,  r)  -  KD{a'>(pi,p)D<'a\p,r)  =  D{-a\pi,r)  . 

Hence,  B  is  a  bisector  of  pk  and  pi.  The  projective  equivalence  ensures  that  B  is 
also  V*-autoparallel. 

(ii)  Recall  the  conformal  relation  (6)  between  and  p,  then  we  see  that 

Vor^(pfc)  =  Vor^conf)(pfc)  holds  on  5™,  where 

Vor(conf)(pfc)  :=  e  Sn\p(pk,p)  <p(pi,p)}. 

l^k 
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Proposition  1  and  the  Legendre  relations  (6)  and  (7)  imply  that  p(pk,p)  is  repre¬ 
sented  with  the  coordinates  (Pi)  by 


p(Pk,p)  =  4>*(P) 


f  n  r.  I  *  \ 

r(pk) + E  -^(Pk)(p*(p)  -  pm) 


i= 1 


where  P  =  P(p).  Note  that  a  point  (. P,yk(P ))  in  Hk  is  expressed  by 


vk(p)  ■■=  r(Pk) + ±  ^(pk)(Pi(p)  -  pm). 

i=  1 


Hence,  we  have  p(pk-,p)  =  ip*(P)  —  Vk(P )■  We  see,  for  example,  that  the  bisector 
on  £n  for  pk  and  pi  is  represented  as  a  projection  of  Hk  H  Hi.  Thus,  the  statement 
follows.  □ 


Figures  1  and  2  taken  from  Ref.  27  show  examples  of  a-Voronoi  partitions  for 
four  common  probability  distributions  on  S 2:  (0.2,  0.7,  0.1),  (0.3,  0.3,  0.4),  (0.4,  0.4, 
0.2),  (0.6,  0.1,  0.3)  with  a  =  —0.6  and  2.  While  the  left  ones  are  represented  with 
usual  probabilities  on  <S2  (the  axis  P3  is  omitted) ,  right  ones  are  the  corresponding 
partitions  represented  with  escort  probabilities  on  £ 2 .  In  right  ones  of  the  both  fig¬ 
ures,  the  bisectors  are  straight  line  segments  on  £ 2  because  they  are  simultaneously 
and  V*-geodesics  as  is  proved  in  (i)  of  Theorem  1. 

Remark  1.  Voronoi  partitions  for  broader  class  of  divergences  that  are  not  neces¬ 
sarily  associated  with  any  convex  potentials  are  theoretically  studied34  from  more 
general  affine  differential  geometric  points  of  views. 

On  the  other  hand,  the  a-divergence  can  be  expressed  as  a  Bregman  divergence 
if  the  domain  is  extended  from  Sn  to  the  positive  orthant  R"+1.5~7  Hence,  the 
a-geometry  on  R"+1  is  dually  flat.  Using  this  property,  a-Voronoi  partitions  on 
R”+1  is  discussed  by  Nielsen  and  Nock.35 

However,  while  both  of  the  above  mentioned  methods  require  constructions  of 
the  polyhedrons  in  the  space  of  dimension  d  =  n  +  2,  the  new  one  proposed  in 
this  paper  does  in  the  space  of  dimension  d  =  n  +  1.  Since  it  is  known36  that 
the  optimal  computational  time  of  polyhedrons  depends  on  the  dimension  d  by 
0(m  log  to  +  vn\d/2 J),  the  new  one  is  better  when  n  is  even  and  m  is  large. 

The  next  proposition  is  a  simple  and  relevant  application  of  escort  probabilities. 
Define  the  a-centroid  c ^  for  given  m  points  p±, . . .  ,pm  on  Sn  by  the  minimizer 
of  the  following  problem: 

m 

min  VD(a)(p,pfc)  . 

p&Snti 

Proposition  2.  The  a-centroid  for  given  m  points  pi,...,pm  on  Sn  is 
represented  in  escort  probabilities  by  the  weighted  average  of  conformal  factors 
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Fig.  1.  An  example  of  a-Voronoi  partition  on  S 2  (left)  for  a  =  —0.6  (or  q  =  0.2)  and  the 
corresponding  one  on  £2  (right). 


Fig.  2.  An  example  of  a-Voronoi  partition  on  S 2  (left)  for  a  =  2  (or  q  =  1.5)  and  the  corre¬ 
sponding  one  on  £2  (right). 


A (pk)  =  1  /Zq(pk),  i.e. 
P*(c(Q))  = 


zq{Pk)Pi{pk ),  i  =  l,...,n  +  l. 


Sfc= 1  Zq(pk)  fc=1 

Proof.  Let  6'  =  0l(p).  Using  (6),  we  have 

mm  m 

J2D{a)(p,Pk)  =  ^2zq(pk)p(p,pk)  =  ^2zq{pk)ltij(e)+'iij*(p(pk))-J2ezm{Pk)  1  ■ 


fc=l 


k= 1 


k= 1 


2=1 
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Then  the  optimality  condition  is 

r*  m  m 

=Y1  Zq{Pk)(rii  -  Vi(Pk))  =  0,  i  =  1, . . .  ,n , 

k=  1  fc=  1 

where  77^  =  qi(p).  Thus,  the  statements  for  i  =  1, . . .  ,n  follow  from  Proposition  1. 
For  i  =  n  +  1,  it  holds  since  the  sum  of  the  weights  is  equal  to  one.  □ 

4.  Related  Dynamical  Systems  on  the  Simplex 

In  this  section,  we  study  properties  of  several  dynamical  systems  naturally  asso¬ 
ciated  with  the  escort  transformation,  the  conformal  flattening  and  the  resultant 
geometric  structure. 


4.1.  Conformal  replicator  equation 

Recall  the  replicator  system  on  the  simplex  Sn  for  given  functions  /,;  (p)  defined  by 

n+1 

Pi  =  Piifiip)  ~  Kp)),  i  =  l,...,n+l,  f{p)  ~'Y^Pifi(p) ,  (10) 

2—1 

which  is  extensively  studied  in  evolutionary  game  theory.  It  is  known37  that 


(i)  the  solution  of  (10)  is  the  gradient  flow  of  a  function  V(p)  satisfying 


fi  = 


dV_ 

dpi 


i  =  1, . . .  ,n  +  1 , 


with  respect  to  the  Shahshahani  metric, 

(ii)  the  KL  divergence  is  a  local  Lyapunov  function  for  an  equilibrium  called  the 
evolutionary  stable  state  (ESS). 

The  Shahshahani  metric  is  defined  on  the  positive  orthant  R™+1  by 

En+1 

k—1  Pk  r  •  •  -1  ,-i 

yij  —  Oij  5  J  —  1, . . . ,  n  T  1  • 

Pi 

Note  that  a  vector  X  =  ^"=1  tangent  to  Sn  is  represented  by  a  tangent 
vector  X  on  R^+1  by  X  =  Xkd/dpk,  where  X1  =  X1 ,  i  =  1  ,...,n  and 

Xn+1  =  —  X1-  Then  we  see  that  the  Shahshahani  metric  induces  the  Fisher 

metric  g  in  (3)  on  <S”  because  J2ij  gijX'Xi  =  Ylkf  3kiXkXl  holds.  Further, 
the  KL  divergence  is  a  canonical  divergence7  of  (g,  V^,  V^-1-*).  Thus,  the  repli¬ 
cator  dynamics  (10)  are  closely  related  with  the  standard  dually  flat  structure 
(g,  V^1),  V^_1i),  which  associates  with  exponential  and  mixture  families  of  proba¬ 
bility  distributions.39 

In  this  subsection,  motivated  by  the  above  two  features  (i)  and  (ii),  we  define 
a  modified  replicator  system  compatible  to  the  dually  flat  structure  (h,  V,  V*)  and 
discuss  their  properties.  See  Harper40  for  another  modification  of  the  replicator 
system. 
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Consider  a  metric  on  R+1  defined  by  h  :=  A g  and  the  following  modified 
replicator  system: 


Pi  =  Zq(p)pi(fi(p)  -  f(p)),  i=  1, . . .  ,n  +  1 .  (11) 

It  is  easy  to  see  the  above  right-hand  sides  define  the  vector  that  is  tangent  to  <S” 
and  the  gradient  of  a  function  V  with  respect  to  h,  since  l  Pi  =  0  and 


n+1  n+1  n+1  n+1  ot/ 

h(x,p)  =  e  M'p,  =  E  /w*  - /E  =  E  ^  < 

z,j=l  i=l  i=l  »— i  ^ 


respectively,  hold  for  any  tangent  vector  X  on  Sn.  Thus,  comparing  (10)  and  (11), 
we  can  conclude  as  follows: 


Proposition  3.  The  gradient  flow  of  a  function  V  on  Sn  with  respect  to  the  con¬ 
formal  metric  h  is  given  by  (11).  Its  trajectories  coincide  with  those  of  (10)  while 
velocities  of  time-evolutions  are  different  by  the  factor  Zq(p). 


We  investigate  properties  of  (11)  in  the  case  that  V(p)  =  —p(r,p)  for  a  fixed 
distribution  r.  Applying  the  result  for  gradient  flows  of  divergences  on  dually  fiat 
spaces,20  we  see  that  the  flow  is  explicitly  given  in  the  V-affine  coordinates  by 

0'{p{t))  =  exp(-f){0l(p(O))  -  9l{r)}  +  dflr),  i  =  l,...,n,  (12) 


i.e.  it  converges  to  r  along  the  V-geodesic  (pregeodesic)  curve. 

On  the  other  hand,  consider  the  optimization  problem  maximizing  V (p)  = 
—p(r,p)  with  m  constraints  of  the  escort  expectations: 

n+1 

((+)),  =  e^(p)+ 

2=1 


ru(p)A)  +11- 

»= i  V 

where  A*  and  Aj  are  prescribed  values.  Since  the  constraints  (13)  form  a  V*- 
autoparallel  submanifold  in  Sn,  the  problem  has  the  unique  maximizer  owing  to 
the  Pythagorean  theorem6,7  in  a  dually  flat  space.  Defining  the  Lagrangian 


i>(p) 


ai+i 


=  Aj,  j  =  1, . . . ,  to  ,  (13) 


2=1 


L(p)  :=p(r,p)  +  E/3J(+  -  ((Aj)}q), 

j= i 


we  have  the  following  optimality  condition  from  (6)  and  (7): 


r)T  m 


=  \n,n  +  4i9)  -  e;  -Y.P<A  -  A]+1)  =  0,  *  =  1 . 

3= 1 
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where  9l  and  r/i  are,  respectively,  the  V-  and  the  V*-affine  coordinates  of  p  intro¬ 
duced  in  Theorem  1,  and  9lr  :=  9l(r).  Hence,  6l  is  affine  with  respect  to  ft  and  the 
maximizer  p  is  in  the  q-exponential  family  represented  in  (8).  These  facts  imply 
that  the  set  of  maximizers  forms  a  V-autoparallcl  submanifold  parametrized  by  ft , 
which  are  determined  by  the  prescribed  values  Aj. 

Combining  this  consideration  with  (12),  we  see  that  the  following  holds: 

Corollary  1.  Let  r  be  any  distribution,  and  suppose  that  po  and  Poo  are  in  the 
q-exponential  family  (8)  parametrized  by  ft  as  9l  =  'EJLiiftj  ~~  A™+1)ft  +  9zrli  = 
1 ,n  and  9n+1  =  0.  The  gradient  flow  (11)  with  V(p)  =  —p(Poo,p)  starting  from 
Po  converges  to  p^  staying  on  the  q-exponential  family. 

In  the  above,  po  and  p^  are  respectively  interpreted  as  maximizers  of  — p(r,p ) 
under  the  constraints  (13)  with  different  values  of  Hj’s.  The  corollary  claims  that 
the  ^-exponential  family  is  an  invariant  manifold  for  the  transition  of  distribution 
from  po  to  Poo  caused  by  the  change  of  Afl s,  if  the  transition  dynamics  are  governed 
by  the  gradient  flow. 


4.2.  Flows  of  escort  transformation 


Consider  a  dynamical  system  induced  by  the  escort  transformation  from  p  to  P 
defined  by  (5).  When  we  identify  the  set  of  escort  distributions  £"  with  Sn ,  the 
transformation  is  regarded  to  define  a  flow  P W  on  <S”  parametrized  by  t  €  R: 


p(t)  _ 


(Pi)* 


=  1, . . . ,  n  +  1,  P(1)  =peSn 


(14) 


where  p  is  a  fixed  probability  distribution. 

Recalling  the  standard  dually  flat  structure,  which  is  obtained  by  limiting  q  — >  1 
(or  a  —t  1)  in  Proposition  1,  we  have  the  corresponding  coordinates0  9lp  :=  9l(p)  = 
ln(pj)  —  ln(pn_|_i),i  =  l,...,n.  In  this  case,  if  a  curve  (0®(f))  on  Sn  is  affinely 
parametrized  by  t  £  R,  we  call  it  e-geodesic.7 

Since  it  follows  that 


9\t)  :=  0l(P«)  =  In -  In pjff  =  t(lnPi  -  lnp„+1)  =  t9'p,  i  =  1, .. .  ,n, 


we  conclude  from  a  viewpoint  of  information  geometry  that  the  flow  of  the  escort 
transformation  (14)  evolves  along  the  e-geodesic  curve  that  passes  p  at  t  =  1. 

Note  that  the  arbitrary  flows  (14)  converge  to  the  uniform  distribution  inde¬ 
pendently  of  p,  when  t  — >  0.  On  the  other  hand,  when  t  — >  ±oo,  it  converges  to  a 
distribution  on  the  boundary  of  Sn  depending  on  the  maximum  or  minimum  com¬ 
ponents  of  p.  See  Ref.  41  as  a  relevant  work.  In  several  literature,42,43  examples  of 
physical  models  with  a  time-evolution  of  the  power  index  of  distribution  functions 
are  reported. 


cThese  coordinates  are  called  the  canonical  parameters  in  statistics  literature. 
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The  above  result  can  be  slightly  generalized  with  a  projective  transformation 
Hr  :  ->  Sn  defined  by 

P  =  ( Pi )  '-t  nr(p)  :=  (  - ),  *  =  1, . . . ,  n  +  1 , 

\Ei=i  riPiJ 

for  a  given  vector  r  =  {rfl)  £  R"+1,  and  the  relation  with  the  replicator  equation  is 
elucidated. 

Proposition  4.  For  arbitrary  r  the  projective  transformation  of  the  escort  flow 
given  in  (14)  evolves  along  the  e-geodesic  curve  that  passes  r  =  r/||r||i  at  t  =  0 
and  Hr  ( p )  at  t  =  1 .  This  flow  evolves  along  the  trajectory  of  the  replicator  equation 
(10)  with  constants  fl  =  ln(pi),  i  =  1, . . . ,  n  +  1. 

Proof.  The  first  statement  follows  from  direct  calculation  of  coordinates  9l  for  the 
standard  dually  flat  structure  when  q  — >  1  (a  — >  1): 

lr(P{t]))  =  HriP. f)  -  ln(r„+1P|t>1)  =  i6£  +  ln(r</rn+ r),  i  =  1, . . .  ,n . 

To  prove  the  second  statement  note  that  that  the  flow  IIr(P^))  is  a  normalization 
of  a  vector  y(t),  each  component  of  which  is  gift)  =  rflpifl .  Hence,  y(t)  satisfies 
the  following  linear  differential  equation: 

Vi  =  ln(pi)j/i,  yi(0)=n,  i  =  l,...,n+l. 

By  setting  xt  =  yi/\\y\\i,  we  have 

d  1  ™"*"1 

—  ln(xi)  =  In  (ft)  -  — —  Vj  =  MPi)  -  Y  xi  ln(Pi)>  i  =  1,  ■  ■  ■ ,  n  +  1 . 

112/11 1  j=1  j=1 

Thus,  nr(P(())  is  the  solution  of 

(  n+1  \  ^ 

±i  =  Xi  I  ln(pi)  -  ln(Pj)xi  I  »  Xi{ 0)  =  i  =  1, . . . ,  n  +  1 . 

This  proves  the  second  statement.  □ 

5.  Concluding  Remarks 

We  have  discussed  two  applications  of  escort  probabilities  and  the  dually  flat  struc¬ 
ture  (h,  V,  V*)  on  Sn  induced  by  conformal  transformations  of  the  a-geometry. 
They  are  used  to  new  directions  except  the  studies  of  multifractal  or  nonextensive 
statistical  physics. 

We  first  demonstrate  a  direct  application  of  the  conformal  flattening  to  com¬ 
putation  of  a-Voronoi  partitions  and  a-centroids.  Escort  probabilities  are  found  to 
work  as  a  suitable  coordinate  system  for  the  purpose.  Further,  conformal  divergence 
and  projective  equivalence  of  affine  connections  also  play  important  roles. 
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In  behavioral  analysis  of  dynamical  systems  we  present  the  properties  of  gradient 
flows  with  respect  to  the  conformal  metric  and  discuss  a  relation  with  the  replicator 
equation.  Next,  we  show  that  the  projective  transformation  of  the  escort  flow  is  e- 
geodesic.  This  flow  describes  a  time-evolution  of  the  power  index  of  distributions. 

Physical  interpretation  of  the  obtained  conformal  structure  is  another  future 
research  direction. 
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